1 garbage collection advantage: improving program locality xianglong huang (ut) stephen m blackburn...
TRANSCRIPT
1
Garbage Collection Advantage:
Improving Program Locality
Xianglong Huang (UT)Stephen M Blackburn (ANU), Kathryn S McKinley (UT)
J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)
2
Motivation
• Memory gap problem• OO programs become more popular• OO programs exacerbates memory gap
problem– Automatic memory management– Pointer data structures– Many small methods
Goal: improve OO program locality
3
Cache Performance Matters
_213_javac
05
10152025303540
Tota
l Cyc
les
(in b
illio
ns)
4
Opportunity
• Generational copying garbage collector reorders objects at runtime
5
1
4
65
7
2 3
Copying of Linked Objects
BreadthFirst
65
7
432
1
6
71 2 3 4 5 6
1
4
65
7
2 3
Copying of Linked Objects
65
7
432
1
BreadthFirst
DepthFirst
7
71 2 3 4 5 6
Copying of Linked Objects
DepthFirst
OnlineObjectReordering
1 4BreadthFirst
61 2 3 4 75
1
4
65
7
2 3
65
7
432
1
41
8
Outline
• Motivation• Online Object Reordering
(OOR)• Methodology• Experimental Results• Conclusion
9
Online Object Reordering
• Where are the cache misses?• How to identify hot field accesses
at runtime?• How to reorder the objects?
10
Where Are The Cache Misses?
VM Objects StackOlder
Generation
• Heap structure:
Nursery
Not to scale
11
Where Are The Cache Misses?
_209_db
0200400600800
100012001400160018002000
To
tal
Acc
esse
s (i
n m
illi
on
s)
L2 hits
L2 misses
12
Where Are The Cache Misses?
• Two opportunities to reorder objects in the older generation– Promote nursery objects– Full heap collection
13
How to Find Hot Fields?
• Runtime info (intercept every read)?
• Compiler analysis?• Runtime information + compiler
analysis Key: Low overhead estimation
14
Which Classes Need Reordering?
Step 1: Compiler analysis– Excludes cold basic blocks– Identifies field accesses
Step 2: JIT adaptive sampling identifies hot methods– Mark as hot field accesses in hot
methods
Key: Low overhead estimation
15
Example: Compiler Analysis
Compiler
Hot BBCollect access info
Cold BBIgnore
Compiler
Access List:1. A.b2. ….….
Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }}
16
Example: Adaptive Sampling
Method Foo { Class A a; try { …=a.b;
… } catch(Exception e){
…a.c }}
Adaptive Sampling
Foo is hot
Foo Accesses:1. A.b2. ….….
A.b is hot
A
B
b…..
c A’s type information
c b
17
1
4
65
7
2 3
Copying of Linked Objects
65
7
43
OnlineObjectReordering
Type Information
143
2
1
Hot space Cold space
18
OOR System Overview
BaselineCompiler
SourceCode
ExecutingCode
AdaptiveSampling Optimizing
Compiler
HotMethods
Access InfoDatabase
Register HotField Accesses
Look Up
AddsEntries
GC: CopiesObjects
Affects Locality
AdviceGC: CopiesObjects
OOR additionJikesRVM componentInput/Output
OptimizingCompiler
AdaptiveSampling
Improves Locality
19
Outline
• Motivation• Online Object Reordering• Methodology• Experimental Results• Conclusion
20
Methodology: Virtual Machine
• Jikes RVM– VM written in Java– High performance– Timer based adaptive sampling – Dynamic optimization
• Experiment setup– Pseudo-adaptive – 2nd iteration [Eeckhout et al.]
21
Methodology: Memory Management
• Memory Management Toolkit (MMTk):– Allocators and garbage collectors– Multi-space heap
• Boot image• Large object space (LOS)• Immortal space
• Experiment setup– Generational copying GC with 4M
bounded nursery
22
Overhead: OOR Analysis Only
Benchmark Base Execution Time (sec)
w/ only OOR Analysis (sec)
Overhead
jess 4.39 4.43 0.84%
jack 5.79 5.82 0.57%
raytrace 4.63 4.61 -0.59%
mtrt 4.95 4.99 0.70%
javac 12.83 12.70 -1.05%
compress 8.56 8.54 0.20%
pseudojbb 13.39 13.43 0.36%
db 18.88 18.88 -0.03%
antlr 0.94 0.91 -2.90%
hsqldb 160.56 158.46 -1.30%
ipsixql 41.62 42.43 1.93%
jython 37.71 37.16 -1.44%
ps-fun 129.24 128.04 -1.03%
Mean -0.19%
23
Detailed Experiments
• Separate application and GC time• Vary thresholds for method heat• Vary thresholds for cold basic
blocks• Three architectures
– x86, AMD, PowerPC
• x86 Performance counter: – DL1, trace cache, L2, DTLB, ITLB
24
Performance javac
25
Performance db
26
Performance jython
Any static ordering leaves you vulnerable to pathological cases.
27
Phase Changes
28
Related Work
• Evaluate static orderings [Wilson et al.]– Large performance variation
• Static profiling [Chilimbi et al., and others]– Lack of flexibility
• Instance-based object reordering [Chilimbi et al.]– Too expensive
29
Conclusion
• Static traversal orders have up to 25% variation
• OOR improves or matches best static ordering
• OOR has very low overhead• Past predicts future
30
Questions?
Thank you!
31
OOR System Overview
• Records object accesses in each method (excludes cold basic blocks)
• Finds hot methods by adaptive sampling
• Reorders objects with hot fields in older generation during GC
• Copies hot objects into separate region