gc advantage: improving program locality
DESCRIPTION
GC Advantage: Improving Program Locality. Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng. Motivation. Memory gap How are Java programs affected?. Marksweep vs. Copying. pseudojbb. Motivation. Javac with perfect L1 and L2 cache. - PowerPoint PPT PresentationTRANSCRIPT
1
GC Advantage: Improving Program Locality
Xianglong Huang, Zhenlin Wang,Stephen M Blackburn, Kathryn S McKinley,
J Eliot B Moss, Perry Cheng
2
Motivation
Memory gapHow are Java programs affected?
3
Marksweep vs. Copying
pseudojbb
4
Motivation
Javac with perfect L1 and L2 cache.
16K L1 256K L2 Appel, GCTk. Breadth first
0
5
10
15
20
25
_213_javac (10̂ 9 cycles)
originalperfect L2perfect L1
5
Motivation
Copying collector can reorder objectsGoal: take advantage of copying collectors
reorder objects to improve locality
6
Exploring The Space
Different policies for traversing rootsClass-oblivious traversal orders
Which traversing order is the best?
Class-based traversal orders How to find the “important” data structure?
7
Different Root Traversal Policies
Two different types of roots: Stack, global variables Remember sets (for generational)
Different traversal orders Copy all roots before traversing any children Copy each root and its children (root-by-root) Split roots
Stack first and the children Remset first and the children
8
Experiment Setup
JikesRVM, JMTkGenerational copying collector with
bounded nursery size of 4MBPseudoAdaptive 2nd iteration
9
Different Root Traversal Policies
•RxR has the best mutator locality
10
Different Root Traversal Policies
•Total execution time
11
Exploring The Space
Different policies for traversing rootsClass-oblivious traversal orders
Which traversing order is the best?
Class-based traversal orders How to find the “important” data structure?
12
Different Traversal Orders
Breadth first 1,2,3,4,5,6,7Pure depth first 1,2,6,3,4,7,5Pure depth first, LIFO 1,5,4,7,3,2,6
1
4
76
2 35
13
Different Traversal Orders
Breadth first 1,2,3,4,5,6,7Pure depth first 1,2,6,3,4,7,5Pure depth first, LIFO 1,5,4,7,3,2,6Partial depth first, 2 children 1,2,6,3,4,5,7
1
4
76
2 35
14
Class Oblivious Type
Different traversal policies Partial DF is the best
15
Exploring The Space
Different policies for traversing rootsClass-oblivious traversal orders
Which traversing order is the best?
Class-based traversal orders How to find the “important” data structure?
16
Class-based Traversal
Class-oblivious traversal orders inflexibleClass-based object traversal
Static profiling Dynamic sampling
17
Static Profiling
Profile object accesses Find hot pairs with strong correlation Example
(1,4), (4,7) and (2,6) have strong correlation Order: 1,4,7,2,6,3,5
1
4
76
2 35
18
Online Profiling
Use the adaptive compiler sampling Hot method Hot basic block
Use field accesses to indicate hot fields Example: (In a hot method)
{Class A a;a.b=…;
… }
A
B
b…..
19
Online Profiling
Micro benchmark results
20
Online Profiling
Geometric mean
21
Reasons
No advice for most of the objects copied For jess, db and raytrace, we only pick <<1% of
the objects as hot objects 5% for javac
The hot fields are within the first 2 pointers 90% of the advised objects for javac
22
Online Profiling
PseudoJBB mutator results Generate advice for 23% of the copied objects 75% of the objects have adviced hot fields
other than first 2
23
Questions
Have we found all the hot objects? Not all hot objects are connected?
Is class-base good enough? For pseudojbb, we need instance-based?
Locality for the nursery objects?
24
Future Work
Sampling technique Catch more hot objects access
Lower the threshold Hot objects that are not connected
Dynamically change the advice for phase changing
Nursery localityDifferent traversal orders for cold objectsInstance-based
25
Conclusion
Reorder objects during copying collection can improve locality
In class-oblivious traversal orders partial depth first order is the best
Online profiling, class-based traversal is more flexible, up to 50% better. very low overhead, ~0%
Still mysteries
26
Questions?
27
Answers?
Lower the threshold of the sampling, not only the hot methods
For objects with only 1 or 2 pointers, it maybe easier just depth first
Maybe the nursery locality is more important
Instance-based advice
28
Online Profiling
Execution overhead
-6.00%-5.00%-4.00%-3.00%-2.00%-1.00%0.00%1.00%2.00%3.00%4.00%5.00%
overhead
29
Online Profiling
Micro benchmark results for mutator time
30
Different Root Traversal Policies
_227_mtrt
31
Static Profiling
Results
32
Answers?
Most objects have only one pointerPercentage of objects copied by advice
(whether it is really hot?) For pseudojbb ~50%, for jess <<1%, for our
micro benchmark ~16%Change! Half of the pairs do not form
chains longer than 2Maybe the nursery locality is more
important
33
Class Oblivious Orderings
Different traversal policies Partial DF is better
pseudoJBB
34
Motivation
MarkSweep vs. Copying Collector
Mutator time of_213_javac
35
Motivation
Mutator L2 misses_213_javac