the java hotspot vm - eth zpeople.inf.ethz.ch/.../slides/w15_01-hotspot-jvm-jit-compilers.pdf ·...
TRANSCRIPT
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
The Java HotSpot VMUnder the Hood
Tobias Hartmann
Compiler Group – Java HotSpot Virtual MachineOracle Corporation
May 2017
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
About me
• Software engineer in the HotSpot JVM Compiler Team at Oracle
– Based in Baden, Switzerland
• Master’s degree in Computer Science from ETH Zurich
• Worked on various compiler-related projects
– Currently working on future Value Type support for Java
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Outline
• Intro: Why virtual machines?
• Part 1: The Java HotSpot VM
– JIT compilation in HotSpot
– Tiered Compilation
• Part 2: What's new in Java 9– Segmented Code Cache
– Compact Strings
– Ahead-of-time Compilation
4
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
A typical computing platform
5
Hardware
Operating system
Java Virtual Machine
User Applications
Java EEJava SE
Application Software
System Software
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
A typical computing platform
6
Hardware
Operating system
Java Virtual Machine
User Applications
Java EEJava SE
Application Software
System Software
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
A typical computing platform
7
Hardware
Operating system
Java Virtual Machine
User Applications
Java EEJava SE
Application Software
System Software
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Programming language implementation
C
Windows
Intel x86
8
Operatingsystem
Languageimplementation
Hardware
Programminglanguage
CompilerStandard librariesDebuggerMemory management
Linux
Intel x86
CompilerStandard librariesDebuggerMemory management
Linux
ARM
CompilerStandard librariesDebuggerMemory management
Solaris
SPARC
CompilerStandard librariesDebuggerMemory management
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 9
(Language) virtual machine
Java
Windows
Intel x86
Operatingsystem
Virtual machine
Hardware
Programminglanguage
HotSpot VM
PPC ARM SPARC
Mac OS X SolarisLinux
JavaScript Scala Python
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Outline
• Intro: Why virtual machines?
• Part 1: The Java HotSpot VM
– JIT compilation in HotSpot
– Tiered Compilation
• Part 2: What's new in Java 9– Segmented Code Cache
– Compact Strings
– Ahead-of-time Compilation
10
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
The JVM: An application developer’s view
Java source code
int i = 0;do {
i++;} while (i < f());
Bytecodes
0: iconst_01: istore_12: iinc5: iload_16: invokestatic f9: if_icmplt 212: return
compileHotSpotJava VMexecute
• Ahead-of-time• Using javac
• Instructions for an abstract machine• Stack-based machine (no registers)
11
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
The JVM: A VM engineer’s view
12
Bytecodes
0: iconst_01: istore_12: iinc5: iload_16: invokestatic f9: if_icmplt 212: return
HotSpot Java VM
Garbage collector
manage
Interpreter
executeHeap
Stack
access
access
Compilation system
compile
C1
C2
Compiled methodproduce
Machine code
Debug info
Object maps
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Outline
• Intro: Why virtual machines?
• Part 1: The Java HotSpot VM
– JIT compilation in HotSpot
– Tiered Compilation
• Part 2: What's new in Java 9– Segmented Code Cache
– Compact Strings
– Ahead-of-time Compilation
13
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Interpretation vs. compilation in HotSpot• Template-based interpreter
– Generated at VM startup (before program execution)
– Maps a well-defined machine code sequence to every bytecode instruction
• Compilation system
– Speedup relative to interpretation: ~100X
– Two just-in-time compilers (C1, C2)
– Aggressive optimistic optimizations
14
Bytecodes0: iconst_01: istore_12: iinc5: iload_16: invokestatic f9: if_icmplt 212: return
Machine codemov -0x8(%r14), %eaxmovzbl 0x1(%r13), %ebxinc %r13mov $0xff40,%r10jpmq *(%r10, %rbx, 8)
Load local variable 1
Dispatch next instruction
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Ahead-of-time vs. just-in-time compilation
• AOT: Before program execution
• JIT: During program execution
– Tradeoff: Resource usage vs. performance of generated code
15
Performance
Amount of compilationInterpretation Compile everything
Bad performancedue to interpretation
Bad performancedue to compilation overhead
Good performancedue to good selection of compiled methods and applied optimizations
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
JIT compilation in HotSpot
• Resource usage vs. performance
– Getting to the “sweet spot”
1. Selecting methods to compile
2. Selecting compiler optimizations
16
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
1. Selecting methods to compile
• Hot methods (frequently executed methods)
• Profile method execution
– # of method invocations, # of backedges
• A method’s lifetime in the VM
17
Interpreter Compiler (C1 or C2) Code cache
Gather profiling information Compile bytecode to native code Store machine code
# method invocations > THRESHOLD1
# of backedges > THRESHOLD2
Deoptimization
Compiler’s optimistic assumptionsproven wrong
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
False
Control flow graph Generated code
Example optimization:
S1;S2;S3;if (x > 3)
S4; S5;S6;S7;
S8;S9;
10’000 0
guard(x > 3)S1;S2;S3;S4;S8;S9;
Deoptimize
18
True
Hot path compilation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Example optimization:
19
class A {void bar() {
S1;}
}
class B extends A {void bar() {
S2;}
}
void foo() {A a = create(); // return A or Ba.bar();
}
Class hierarchy Method to be compiled
loaded
not loaded
Compiler:Inline call?Yes.
Virtual call inlining
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Example optimization: Virtual call inlining
• Benefits of inlining
– Virtual call avoided
– Code locality
• Optimistic assumption: only A is loaded
– Note dependence on class hierarchy
– Deoptimize if hierarchy changes
20
class A {void bar() {
S1;}
}
class B extends A {void bar() {
S2;}
}
void foo() {A a = create(); // return A or BS1;
}
Class hierarchy Method to be compiled
loaded
not loaded
Compiler:Inline call?Yes.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Example optimization: Virtual call inlining
21
class A {void bar() {
S1;}
}
class B extends A {void bar() {
S2;}
}
void foo() {A a = create(); // return A or Ba.bar();
}
Class hierarchy Method to be compiled
loaded
loaded
Compiler:Inline call?No.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Deoptimization
• Compiler’s optimistic assumption proven wrong
– Assumptions about class hierarchy
– Profile information does not match method behavior
• Switch execution from compiled code to interpretation
– Reconstruct state of the interpreter at runtime
– Complex implementation
• Compiled code
– Possibly thrown away
– Possibly reprofiled and recompiled
22
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Performance effect of deoptimization
• Follow the variation of a method’s performance
23
Performance
Time
Interpreted Compiled Interpreted Compiled
Compilation DeoptimizationVM Startup Compilation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
JIT compilation in HotSpot
• Resource usage vs. performance
– Getting to the “sweet spot”
1. Selecting methods to compile
2. Selecting compiler optimizations
24
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
2. Selecting compiler optimizations
• C1 compiler
– Limited set of optimizations
– Fast compilation
– Small footprint
• C2 compiler
– Aggressive optimistic optimizations
– High resource demands
– High-performance code
• Graal
– Experimental compiler
– Will be part of HotSpot for AOT in JDK 9
25
Client VM
Server VM
Tiered Compilation(enabled since JDK 8)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Outline
• Why virtual machines?
• Part 1: The Java HotSpot VM
– JIT compilation in HotSpot
– Tiered Compilation
• Part 2: What's new in Java 9– Segmented Code
– Compact Strings
– Ahead-of-Time Compilation
• Conclusion
26
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Tiered Compilation
• Introduced in JDK 7, enabled by default in JDK 8
• Combines the benefits of
– Interpreter: Fast startup
– C1: Fast compilation
– C2: High peak performance
• Within the sweet spot
– Faster startup
– More profile information
27
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Benefits of Tiered Compilation
28
Performance
Time
VM Startup
Interpreted C1-compiled
warm-up time
Client VM (C1 only)
Compilation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Benefits of Tiered Compilation
29
PerformanceInterpreted C2-compiled
warm-up time
Server VM (C2 only)
Time
VM Startup Compilation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Benefits of Tiered Compilation
30
PerformanceInterpreted C1-compiled
warm-up time
Tiered compilation
C2-compiled
Time
VM Startup Compilation Compilation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Additional benefit: More accurate profiling
time
Interpreter C1 (profiled) C2 (non-profiled)
Interpreter
Profiling without tiered compilation
Profiling with tiered compilation
C2 (non-profiled)
300 samples
100 samples 1000 samples
100 samples 200 samples
w/o tiered compilation: 300 samples gatheredw/ tiered compilation: 1’100 samples gathered
31
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Tiered Compilation
• Combined benefits of interpreter, C1, and C2
• Additional benefits
– More accurate profiling information
• Drawbacks
– Complex implementation
– Careful tuning of compilation thresholds needed
– More pressure on code cache
32
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
A method’s lifetime (Tiered Compilation)
Interpreter C1 C2
Code cache
Collect profiling information Generate code quicklyContinue collectingprofiling information
Generate high-quality codeUse profiling information
Deoptimization
33
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Performance of a method (Tiered Compilation)
34
Performance
Time
VM Startup
Interpreted C1 compiled C2 compiled
Compilation Compilation
Interpreted C2 compiled
Deoptimization Compilation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Compilation levels (detailed view)
Interpreter
C1: no profiling
C1: limited profiling
C1: full profiling
C2
0
1
2
3
4
Co
mp
ilati
on
leve
lTypical compilation sequence
Associated thresholds:Tier3InvokeNotifyFreqLogTier3BackedgeNotifyFreqLogTier3InvocationThresholdTier3MinInvocationThresholdTier3BackEdgeThresholdTier3CompileThreshold
Associated thresholds:Tier4InvocationThresholdTier4MinInvocationThresholdTier4CompileThresholdTier4BackEdgeThreshold
35
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Outline
• Intro: Why virtual machines?
• Part 1: The Java HotSpot VM
– JIT compilation in HotSpot
– Tiered Compilation
• Part 2: What's new in Java 9– Segmented Code Cache
– Compact Strings
– Ahead-of-time Compilation
36
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
What is the code cache?
• Stores code generated by JIT compilers
• Continuous chunk of memory
– Managed (similar to the Java heap)
– Fixed size
• Essential for performance
37
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Code cache usage: JDK 6 and 7
free space
VM internals
compiled code
38
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Code cache usage: JDK 8 (Tiered Compilation)
39
free space
VM internals
C1 compiled (profiled)
C2 compiled (non-profiled)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Code cache usage: JDK 9
40
free space
VM internals
C1 compiled (profiled)
C2 compiled (non-profiled)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Challenges
• Tiered compilation increases amount of code by up to 4X
• All code is stored in a single code cache
• High fragmentation and bad locality
• But is this a problem in real life?
41
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 42
Code cache usage: Reality
profiled code
non-profiled code
free space
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 43
Code cache usage: Reality
hotness
profiled code
non-profiled code
free space
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Design: Types of compiled code
Optimization level Size Cost Lifetime
Non-method code optimized small cheap immortal
Profiled code (C1) instrumented medium cheap limited
Non-profiled code (C2) highly optimized large expensive long
44
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
• Without Segmented Code Cache • With Segmented Code Cache
45
Design
Code Cache
non-profiled methods
profiled methods
non-methods
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
non-profiled methodsprofiled methods
46
Segmented Code Cache: Reality
profiled code
non-profiled code
free space
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 47
Segmented Code Cache: Reality
non-profiled methodsprofiled methods hotness
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Evaluation: Code locality
48
targets[0].amount()
targets[0].amount()
targets[1].amount()
targets[1].amount()
targets[2].amount()
Code Cache
profiled code
non-profiled code
targets[2].amount()
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Evaluation: Code locality
49
profiled code
non-profiled code
targets[0].amount()
targets[0].amount()
targets[1].amount()
targets[1].amount()
targets[2].amount()
Code Cache
targets[2].amount()
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
0
1
2
3
4
5
6
7
8
9
10
128 256 512 1024 2048 4096
Spe
ed
up
in %
Number of call targets
Evaluation: Code locality
50
L1 ITLB L2 STLB
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Evaluation: Code locality
• Instruction Cache (ICache)
– 14% less ICache misses
• Instruction Translation Lookaside Buffer (ITLB1)
– 44% less ITLB misses
• Overall performance– 9% speedup with microbenchmark
51
1 caches virtual to physical address mappings to avoid slow page walks
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Evaluation: Responsiveness
• Sweeper (GC for compiled code)
52
0
5
10
15
20
25
30
35
40
# full sweeps Cleanup pause time Sweep time
Re
du
ctio
n in
%
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Evaluation: Performance
53
0
2
4
6
8
10
12
14
SPECjbb2005 SPECjbb2013 JMH-Javac Octane (Typescript) Octane (Gbemu)
Imp
rove
me
nt
in %
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
What we have learned
• Segmented Code Cache helps
– To reduce the sweeper overhead and improve responsiveness
– To reduce memory fragmentation
– To improve code locality
• And thus improves overall performance
• To be released with JDK 9
54
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Outline
• Intro: Why virtual machines?
• Part 1: What's cool in Java 8
– Background: JIT compilation in HotSpot
– Tiered Compilation
• Part 2: What's new in Java 9– Segmented Code Cache
– Compact Strings
– Ahead-of-Time Compilation
55
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 56
public class HelloWorld {public static void main(String[] args) {
String myString = "HELLO";System.out.println(myString);
}}
Java Strings
public final class String {private final char value[];...
}
char value[] =
H
0x0048 0x0045 0x004C 0x004C 0x004F
2 bytes
E L L O
UTF-16 encoded
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
“Perfection is achieved, not when there is nothing more to add, but when there is nothing more to take away.”
– Antoine de Saint Exupéry
57
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
There is a lot to take away here..
• UTF-16 encoded Strings always occupy two bytes per char
• Wasted memory if only Latin-1 (one-byte) characters used:
• But is this a problem in real life?
58
char value[] =
H
0x0048 0x0045 0x004C 0x004C 0x004F
2 bytes
E L L O
0x0048 0x0045 0x004C 0x004C 0x004F
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Real life analysis: char[] footprint
• 950 heap dumps from a variety of applications
– char[] footprint makes up 10% - 45% of live data
– Majority of characters are single byte
• Predicted footprint reduction of 5% - 10%
59
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Project Goals
• Memory footprint reduction by improving space efficiency of Strings
• Meet or beat performance of JDK 9
• Full compatibility with related Java and native interfaces
• Full platform support
– x86/x64, SPARC, ARM
– Linux, Solaris, Windows, Mac OS X
60
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Design
• String class now uses a byte[] instead of a char[]
• Additional 'coder' field indicates which encoding is used
61
public final class String {private final byte value[];private final byte coder;...
}
H E L L O
byte value[] = 0x00 0x48 0x00 0x45 0x00 0x4C 0x00 0x4C 0x00 0x4F
byte value[] = 0x48 0x45 0x4C 0x4C 0x4F
UTF-16 encoded
Latin-1 encoded
H E L L O
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Design
• If all characters have a zero upper byte
→ String is compressed to Latin-1 by stripping off high order bytes
• If a character has a non-zero upper byte
→ String cannot be compressed and is stored UTF-16 encoded
62
byte value[] = 0x00 0x48 0x00 0x45 0x00 0x4C 0x00 0x4C 0x00 0x4F
byte value[] = 0x48 0x45 0x4C 0x4C 0x4F
UTF-16 encoded
Latin-1 encoded
InflationCompression
0x47
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Design
• Compression / inflation needs to fast
• Requires HotSpot support in addition to Java class library changes
– JIT compilers: Intrinsics and String concatenation optimizations
– Runtime: String object constructors, JNI, JVMTI
– GC: String deduplication
• Kill switch to enforce UTF-16 encoding (-XX:-CompactStrings)
– For applications that extensively use UTF-16 characters
63
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 64
public class LogLineBench {int size;
String method = generateString(size);
public String work() throws Exceptions {return "[" + System.nanoTime() + "] " +
Thread.currentThread().getName() +"Calling an application method \"" + method +"\" without fear and prejudice.";
}
Microbenchmark: LogLineBench
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
LogLineBench results
Performance ns/op Allocated b/op
1 10 100 1 10 100
Baseline 149 153 231 888 904 1680
CS disabled 152 150 230 888 904 1680
CS enabled 142 139 169 504 512 904
65
• Kill switch works (no regression)
• 27% performance improvement and 46% footprint reduction
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Evaluation: Performance
• SPECjbb2005
– 21% footprint reduction
– 27% less GCs
– 5% throughput improvement
• SPECjbb2015
– 7% footprint reduction
– 11% critical-jOps improvement
• Weblogic (startup)
– 10% footprint reduction
– 5% startup time improvement
• To be released with JDK 9
66
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Outline
• Intro: Why virtual machines?
• Part 1: The Java HotSpot VM
– JIT compilation in HotSpot
– Tiered Compilation
• Part 2: What's new in Java 9– Segmented Code Cache
– Compact Strings
– Ahead-of-time Compilation
67
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Ahead-of-Time Compilation
• Compile Java classes to native code prior to launching the VM
• AOT compilation is done by new jaotc tool
– Uses Java based Graal compiler as backend
– Stores code and metadata in shared object file
• Improves start-up time– Limited impact on peak performance
• Sharing of compiled code between VM instances
68
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Revisit: Performance of a method (Tiered Compilation)
69
Performance
Time
VM Startup
Interpreted C1 compiled C2 compiled
Compilation Compilation
Interpreted C2 compiled
Deoptimization Compilation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Performance of a method (Tiered AOT)
70
Performance
Time
VM Startup
C2 compiled
Compilation Compilation
Interpreted C2 compiled
Deoptimization Compilation
AOT compiled C1 compiled
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Ahead-of-Time Compilation
• Experimental feature
– Supported on Linux x64
– Limited to the java.base module
• Try with your own code - feedback is welcome!
• To be released with JDK 9
– More to come in future releases
71
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Summary
• Many cool features to come with Java 9 in July 2017
– Segmented Code Cache, Compact Strings, Ahead-of-Time compilation
• Java – A vibrant platform
– Early access releases are available: https://jdk9.java.net/download/
• The future of the Java platform"Our SaaS products are built on top of Java and the Oracle DB—that’s the platform.”
Larry Ellison, Oracle CTO
• Questions?
72