comparison of jvm phases on data cache performance shiwen hu and lizy k. john laboratory for...

24
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin

Upload: cornelia-lucas

Post on 21-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Comparison of JVM Phases on Data Cache Performance

Shiwen Hu and Lizy K. John

Laboratory for Computer Architecture

The University of Texas at Austin

The First Workshop on Managed Run Time Workloads 2

Motivation

Execution of Java programs consists of distinct JVM phases JIT compilation Garbage collection Execution

Efficient execution of Java programs necessitates a comparative study of requirements and characteristics of JVM phases

The First Workshop on Managed Run Time Workloads 3

Outline

Experimental methodology

Varying cache metrics Cache size, set associativity, block size

Decomposition by miss types

Time varying cache behavior

Conclusion

The First Workshop on Managed Run Time Workloads 4

Methodology

LaTTe JVM: An open-source, state-of-the-art JVM

Memory management of LaTTe JIT compiler Reusable initial stack – 50KB Allocate dynamic stacks when necessary – recyclable

Heap Management Large object area: indexed by a hash table Small object area: heads indicating object sizes

The First Workshop on Managed Run Time Workloads 5

Methodology (Cont.)

Experimental workloads: SPECjvm 98 benchmarks Using s10 data set

Cache simulator: Based on Cachesim5 from Sun’s Shade V6 tool suite A JVM phase aware cache simulator

Default configuration: 64KB, 32B blocks, 4-way set associative

The First Workshop on Managed Run Time Workloads 6

Breakdown of JVM phases

JIT compilation and execution phases dominate In terms of instruction counts, data references, and

data misses

Garbage collector has the highest miss rates Large working set (heap) and pointer-chasings But, rarely affects overall cache performance

The First Workshop on Managed Run Time Workloads 7

Breakdown of JVM phases (Cont.)

0%

20%

40%

60%

80%

100%

comp db jack javac jess mpeg mtrt

J IT GC EXEC

The First Workshop on Managed Run Time Workloads 8

Varying cache size

Increasing cache size is more effective on JIT compilation than on garbage collection Larger working set of garbage collector Pointer chasing access pattern of garbage collector Stacks of most JIT compilations can be held in

128K cache

Varying effect on execution phase More effective on mpegaudio than on db

The First Workshop on Managed Run Time Workloads 9

0%

4%

8%

12%

16%

Mis

s ra

te

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

comp db jack javac jess mpeg mtrt

16K 32K64K 128K

Varying cache size (Cont.)

The First Workshop on Managed Run Time Workloads 10

Varying set associativity

Increasing set associativity rarely affects JIT compilation and garbage collection Negligible conflict misses due to uniform accesses

to heap or stacks Dominated by capacity misses Short lives of JIT objects

Varying effectiveness on execution phase mtrt: 52% misses eliminated db and javac: 13% misses eliminated

The First Workshop on Managed Run Time Workloads 11

Varying set associativity (Cont.)

0%

4%

8%

12%

16%

Mis

s ra

te

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

comp db jack javac jess mpeg mtrt

full 4 way

2 way direct

The First Workshop on Managed Run Time Workloads 12

Varying block size

Effective on JIT compilation and garbage collection JIT compilation: good spatial locality due to stack

initialization Garbage collection: good spatial locality during

sweep phase

Varying effectiveness on execution phase Larger block: db, jess, mpegaudio, and mtrt Smaller block: compress, jack, javac

The First Workshop on Managed Run Time Workloads 13

Varying block size (Cont.)

0%

4%

8%

12%

16%

Mis

s ra

te

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

AL

LJI

TG

CE

XE

C

comp db jack javac jess mpeg mtrt

256B 128B

64B 32B

The First Workshop on Managed Run Time Workloads 14

Capacity misses dominate Less compulsory misses

Reusable initial stack Overlapped dynamic

stacks

Negligible conflict misses Splitting cache rarely

affects miss type

composition

Decomposition by miss types - JIT

JIT

0%

20%

40%

60%

80%

100%

U S U S U S U S U S U S U S

comp db jack jav ac jess mpeg mtrt

cold conf lict capacity

The First Workshop on Managed Run Time Workloads 15

Fewest compulsory misses in unified cache Cache blocks accessed during execution phase

More compulsory missesin split cache Uniform heap sweeping

Decomposition by miss types - GC

GC

0%

20%

40%

60%

80%

100%

U S U S U S U S U S U S U S

comp db jack jav ac jess mpeg mtrt

cold conf lict capacity

The First Workshop on Managed Run Time Workloads 16

Decomposition by miss types - EXEC

EXEC

0%

20%

40%

60%

80%

100%

U S U S U S U S U S U S U S

comp db jack jav ac jess mpeg mtrt

cold conf lict capacity

Relatively more compulsory misses Heap objects allocation and initialization

Variety reveals program

characteristics Splitting cache rarely

affects miss type

composition

The First Workshop on Managed Run Time Workloads 17

Time varying behavior

Importance of separating JVM activities from application activities Java programs execute on JVMs, differing with

C/C++ programs Correlating performance results with JVM or

application characteristics is important to design better JVMs

The First Workshop on Managed Run Time Workloads 18

Time varying behavior (Cont.)

JVM specific operations dominate the startup and end of application execution Class loadings, method compilations

Few garbage collections Corresponding to burst of cache misses

Four passes of JIT compilation correspond to four bursts of cache misses

The First Workshop on Managed Run Time Workloads 19

Time varying behavior - compress

Less GC and JIT activities Cyclic behavior Two phases during execution

compress

0%

5%

10%

15%

20%

25%

30%

35%

1 201 401 601 801 1001

instruction count (10^6)

mis

s r

ate

01002003004005006007008009001000

Instr

ucti

on

s (

10

^3

)JIT GC miss rate

The First Workshop on Managed Run Time Workloads 20

Time varying behavior - mtrt

More GC and JIT activities No cyclic behavior

mtrt

0%

10%

20%

30%

40%

50%

60%

1 101 201 301 401 501 601 701 801instruction count(10^6)

mis

s r

ate

0

200

400

600

800

1000

instr

ucti

ons (

10^3)

JIT GC miss rate

The First Workshop on Managed Run Time Workloads 21

Time varying behavior - startup

Identical behavior during startup First 110 million instructions

Sharing of harness classes among SPECjvm 98 benchmarks prolongs the duration

First 150M instructions

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

1 21 41 61 81 101 121 141instructions (10^6)

mis

s ra

te

comp db jackjavac jess mpegmtrt

The First Workshop on Managed Run Time Workloads 22

Conclusion

Comparative study of cache performance of distinct JVM phases

Deterministic characteristics of cache behavior JIT compilation: traversing intermediate data

structures Garbage collection: large working set and pointer

chasings

The First Workshop on Managed Run Time Workloads 23

Conclusion (Cont.)

Near identical cache performance of JIT compilation among applications

Varying cache behavior during execution phase reveal characteristics of applications

Thanks