comparison of jvm phases on data cache performance shiwen hu and lizy k. john laboratory for...
TRANSCRIPT
Comparison of JVM Phases on Data Cache Performance
Shiwen Hu and Lizy K. John
Laboratory for Computer Architecture
The University of Texas at Austin
The First Workshop on Managed Run Time Workloads 2
Motivation
Execution of Java programs consists of distinct JVM phases JIT compilation Garbage collection Execution
Efficient execution of Java programs necessitates a comparative study of requirements and characteristics of JVM phases
The First Workshop on Managed Run Time Workloads 3
Outline
Experimental methodology
Varying cache metrics Cache size, set associativity, block size
Decomposition by miss types
Time varying cache behavior
Conclusion
The First Workshop on Managed Run Time Workloads 4
Methodology
LaTTe JVM: An open-source, state-of-the-art JVM
Memory management of LaTTe JIT compiler Reusable initial stack – 50KB Allocate dynamic stacks when necessary – recyclable
Heap Management Large object area: indexed by a hash table Small object area: heads indicating object sizes
The First Workshop on Managed Run Time Workloads 5
Methodology (Cont.)
Experimental workloads: SPECjvm 98 benchmarks Using s10 data set
Cache simulator: Based on Cachesim5 from Sun’s Shade V6 tool suite A JVM phase aware cache simulator
Default configuration: 64KB, 32B blocks, 4-way set associative
The First Workshop on Managed Run Time Workloads 6
Breakdown of JVM phases
JIT compilation and execution phases dominate In terms of instruction counts, data references, and
data misses
Garbage collector has the highest miss rates Large working set (heap) and pointer-chasings But, rarely affects overall cache performance
The First Workshop on Managed Run Time Workloads 7
Breakdown of JVM phases (Cont.)
0%
20%
40%
60%
80%
100%
comp db jack javac jess mpeg mtrt
J IT GC EXEC
The First Workshop on Managed Run Time Workloads 8
Varying cache size
Increasing cache size is more effective on JIT compilation than on garbage collection Larger working set of garbage collector Pointer chasing access pattern of garbage collector Stacks of most JIT compilations can be held in
128K cache
Varying effect on execution phase More effective on mpegaudio than on db
The First Workshop on Managed Run Time Workloads 9
0%
4%
8%
12%
16%
Mis
s ra
te
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
comp db jack javac jess mpeg mtrt
16K 32K64K 128K
Varying cache size (Cont.)
The First Workshop on Managed Run Time Workloads 10
Varying set associativity
Increasing set associativity rarely affects JIT compilation and garbage collection Negligible conflict misses due to uniform accesses
to heap or stacks Dominated by capacity misses Short lives of JIT objects
Varying effectiveness on execution phase mtrt: 52% misses eliminated db and javac: 13% misses eliminated
The First Workshop on Managed Run Time Workloads 11
Varying set associativity (Cont.)
0%
4%
8%
12%
16%
Mis
s ra
te
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
comp db jack javac jess mpeg mtrt
full 4 way
2 way direct
The First Workshop on Managed Run Time Workloads 12
Varying block size
Effective on JIT compilation and garbage collection JIT compilation: good spatial locality due to stack
initialization Garbage collection: good spatial locality during
sweep phase
Varying effectiveness on execution phase Larger block: db, jess, mpegaudio, and mtrt Smaller block: compress, jack, javac
The First Workshop on Managed Run Time Workloads 13
Varying block size (Cont.)
0%
4%
8%
12%
16%
Mis
s ra
te
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
AL
LJI
TG
CE
XE
C
comp db jack javac jess mpeg mtrt
256B 128B
64B 32B
The First Workshop on Managed Run Time Workloads 14
Capacity misses dominate Less compulsory misses
Reusable initial stack Overlapped dynamic
stacks
Negligible conflict misses Splitting cache rarely
affects miss type
composition
Decomposition by miss types - JIT
JIT
0%
20%
40%
60%
80%
100%
U S U S U S U S U S U S U S
comp db jack jav ac jess mpeg mtrt
cold conf lict capacity
The First Workshop on Managed Run Time Workloads 15
Fewest compulsory misses in unified cache Cache blocks accessed during execution phase
More compulsory missesin split cache Uniform heap sweeping
Decomposition by miss types - GC
GC
0%
20%
40%
60%
80%
100%
U S U S U S U S U S U S U S
comp db jack jav ac jess mpeg mtrt
cold conf lict capacity
The First Workshop on Managed Run Time Workloads 16
Decomposition by miss types - EXEC
EXEC
0%
20%
40%
60%
80%
100%
U S U S U S U S U S U S U S
comp db jack jav ac jess mpeg mtrt
cold conf lict capacity
Relatively more compulsory misses Heap objects allocation and initialization
Variety reveals program
characteristics Splitting cache rarely
affects miss type
composition
The First Workshop on Managed Run Time Workloads 17
Time varying behavior
Importance of separating JVM activities from application activities Java programs execute on JVMs, differing with
C/C++ programs Correlating performance results with JVM or
application characteristics is important to design better JVMs
The First Workshop on Managed Run Time Workloads 18
Time varying behavior (Cont.)
JVM specific operations dominate the startup and end of application execution Class loadings, method compilations
Few garbage collections Corresponding to burst of cache misses
Four passes of JIT compilation correspond to four bursts of cache misses
The First Workshop on Managed Run Time Workloads 19
Time varying behavior - compress
Less GC and JIT activities Cyclic behavior Two phases during execution
compress
0%
5%
10%
15%
20%
25%
30%
35%
1 201 401 601 801 1001
instruction count (10^6)
mis
s r
ate
01002003004005006007008009001000
Instr
ucti
on
s (
10
^3
)JIT GC miss rate
The First Workshop on Managed Run Time Workloads 20
Time varying behavior - mtrt
More GC and JIT activities No cyclic behavior
mtrt
0%
10%
20%
30%
40%
50%
60%
1 101 201 301 401 501 601 701 801instruction count(10^6)
mis
s r
ate
0
200
400
600
800
1000
instr
ucti
ons (
10^3)
JIT GC miss rate
The First Workshop on Managed Run Time Workloads 21
Time varying behavior - startup
Identical behavior during startup First 110 million instructions
Sharing of harness classes among SPECjvm 98 benchmarks prolongs the duration
First 150M instructions
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
1 21 41 61 81 101 121 141instructions (10^6)
mis
s ra
te
comp db jackjavac jess mpegmtrt
The First Workshop on Managed Run Time Workloads 22
Conclusion
Comparative study of cache performance of distinct JVM phases
Deterministic characteristics of cache behavior JIT compilation: traversing intermediate data
structures Garbage collection: large working set and pointer
chasings
The First Workshop on Managed Run Time Workloads 23
Conclusion (Cont.)
Near identical cache performance of JIT compilation among applications
Varying cache behavior during execution phase reveal characteristics of applications