java compilation

25
® IBM Software Group © 2011 IBM Corporation Java Compilation From Top to Bottom Mike Kucera – IBM Rational March 11, 2011

Upload: mike-kucera

Post on 13-Jul-2015

361 views

Category:

Documents


1 download

TRANSCRIPT

®

IBM Software Group

© 2011 IBM Corporation

Java CompilationFrom Top to Bottom

Mike Kucera – IBM RationalMarch 11, 2011

2

Innovation for a smarter planet

Compiling Java - 10,000 Foot View

Write and Debug Java code in an IDE (eclipse)

Compile Java source into bytecode (class files)

Run the bytecode on any JVM on any platform

At runtime JIT compile the bytecode into native code for performance

3

Innovation for a smarter planet

IBM and Java

IBM has over 3000 products based on Java.

IBM sells hardware (PowerPC and SystemZ) and these platforms must support Java applications.

IBM Java is optimized to run IBM software, especially Websphere Application Server.

4

Innovation for a smarter planet

IBM and Java

IBM Java supports 12 different platforms and many other embedded space platformsRe-use is the only way to scale

Challenging to do things right across all platforms

If there is a bug… it will be found

Java is developed across multiple development sites

Code straddles the boundary of research and production

IBM develops tools for Java developers (based on Eclipse)RAD – Rational Application Developer

5

Innovation for a smarter planet

Worldwide Java Development Team

TorontoDynamic/Static compilationXML parsing

OttawaJ9 JVMEclipse IDEJ2ME libraries

HursleyJ2SE libraries and CORBAJ2SE integration and deliveryCustomer service

BangaloreIntegration testingCustomer serviceField release development

ShanghaiGlobalizationSpecialized testing

PhoenixJ2ME developmentJ2ME delivery

AustinJava and XML securityAIX system testPowerPC specialists

Poughkeepsiez/OS system testS/390 specialists

RochesteriSeries development

6

Innovation for a smarter planet

First Step - Write Java code in an IDE

7

Innovation for a smarter planet

What is an IDE?

IDE - Integrated Development Environment Powerful editor for writing your programs Makes writing software faster and easier

Increased developer productivity

Understands your codeNot just a text editorParses and analyzes the code

Provides an integrated environment for all your toolsVersion Control (SVN, CVS, Jazz, etc..)DebuggersPerformance EngineeringDocumentation ToolsDatabasesEtc…

8

Innovation for a smarter planet

Writing Code using an IDE Modern Java IDEs have many advanced code editing features

Instant Feedback Detect syntax errors as you type.

Code Navigation Instantly jump from a method call to the method definition

Refactoring Rename a method and the IDE will find everywhere the method is called and

rename all the calls.Code Completion

Start typing and the IDE finishes it for you.Visualizations

View a type hierarchy View the structure outline of a class.

Quick Assist Automatically fix coding errors for you.

And many more....

9

Innovation for a smarter planet

ECJ – Eclipse Compiler for Java

At the core of eclipse there is a Java compiler.Designed with the needs of an IDE in mind.

The compiler has three outputs:Generate ASTsGenerate bytecode (class files)Generate an on-disk index file

ASTs can be used directly by some featureseg) Refactoring

Index is used for fast lookup of program elements.eg) Code navigation, Search, Generate Type Hierarchy

Compiler is designed to support recompilation while debugging Incremental compilation

10

Innovation for a smarter planet

Incremental Compilation

An incremental compiler will only recompile the parts of the code that have changed.

Avoid wasteful recompilation of unchanged parts.Reduces the granularity of a language's translation units.

ECJ will only recompile files that have changed.A standard C compiler will compile all the header files included by a source file.The standard javac compiler is not an incremental compiler.

Very important for productivity.Long compilation pauses are unacceptable.The developer needs to be able to recompile code changes very quickly.

11

Innovation for a smarter planet

Parsing Parse the code in the editor.

Supports different versions of Java.Parser runs whenever the user stops typing for a few seconds. Instantly reports syntax errors and warnings.

Parser generated from an LALR parser generator.Grammar file contains grammar rules in BNF form.Most rules have actions associated with them.Actions build the AST in a bottom up fashion

Leaf nodes created first. Last node to be created is the root.

Unique challengesSyntax error recovery needs to be really good.Parse unsaved code in the editor.Content assist.Can't desugar.

12

Innovation for a smarter planet

Content Assist The IDE will complete the code for you.

Problem: user hasn't finished typing a full statement yet, therefore there is a syntax error at the insertion point.

Must recover from the error and compute a list of possible completions.

13

Innovation for a smarter planet

Refactoring

Transforming code into a new form that behaves the same as before but is structured better.

RenameExtract local variable Inline expression Inline methodExtract superclassExtract interfaceChange method signatureEtc...

Refactorings are performed on the AST with the help of the index.

Rewrite rules

14

Innovation for a smarter planet

Desugaring

Syntactic SugarSyntax that is equivalent to some other syntax

in the language but is more convenient or compact.

i++; i += 1; i = i + 1;

DesugaringThe parser produces the same AST fragment

for different syntax.Convenient for code generation.

AST produced by IDE cannot be desugared.The AST needs to represent exactly what is in

the user's source.All source offsets must be preserved.Comments must be preserved.

15

Innovation for a smarter planet

AST

Eclipse actually has two separate ASTs for Java. “Internal” AST

May be desugared and extended by the parser. Used to resolve compilation problems, perform type checking and generate

bytecode. Example:

– In Java if you do not provide a constructor the compiler will provide a default constructor for you.

– This is implemented by adding a constructor node under a class node. “DOM” AST

Exactly represents the user's source code, no desugaring. Generated from the internal AST.

– “Cleaned up” Used for code completion, refactoring, and generating the index. Example:

– The default constructor node is filtered out because it does not actually exist in the source.

16

Innovation for a smarter planet

Bytecode Generation

Each AST node has a generateCode() method.

Code generation is doneby a depth-first traversalof the AST.

Each generateCode() method first calls generateCode() on its children then generates code for itself.

This works because the JVM is a stack machine.

17

Innovation for a smarter planet

Bytecode Interpretation

JVM is a stack machine.

18

Innovation for a smarter planet

Dynamic Class Loading

Static languages have a linking step after compilation.

Java uses Dynamic Class LoadingAll classes are resolved at runtime.The first time a class name is encountered it is loaded by the JVM.

Searches the “classpath” for the class file to load.

Advantages:Reflection

load and use classes at runtime that were not known to the compiler.Hotswap :)

Make code changes as you are debugging. Incremental compiler recompiles the class file, unloads the old version of the class

and loads the new one. Change the behaviour of the program while it is running without needing to restart.

Creates many challenges for the JIT compilerSome optimizations are performed based on assumptions. A class may be loaded at any time that invalidates these assumptions and requires the

optimization to be backed out.

19

Innovation for a smarter planet

JIT Compilation

Also known as Dynamic Compilation

Java bytecode is compiled into native machine code while the application is running.

Results in ~10x speed improvement over pure interpretation.

Compilation overhead is a runtime costThere must be a payoffThe resulting speedup must outweigh the cost of compiling the method.Only compile the “hottest” methods.

Granularity:Method based JIT – compilation unit is a methodTracing JIT – compilation unit is a basic block IBM Java JIT compiler is method based

20

Innovation for a smarter planet

JIT Compilation Control

A sampling thread wakes up every X milliseconds and records all the methods that are currently executing.

When a method reaches some threshold it is queued for native code compilation.

The method is initially compiled at a low optimization level.More optimizations increases compilation overhead.

The jitted version of the method is used on subsequent callsNote, the interpreted version of the method may still be executing somewhere.

If the jitted method is still hot it may get queued up again for compilation at higher optimization levels.

JIT compilation happens in separate threads.Good when you have underutilized cores available.

21

Innovation for a smarter planet

JIT Characteristics

The JIT compiler can optimize for the target CPU and OS where the application is running.

The JIT can detect if certain instruction sets are supported.Knows the size of the data and instruction caches.Knows how many registers are available.

In contrast a static compiler must generate code for the lowest common denominator, or generate code separately for each possible target.

JIT compiler has access to profiling data which it can use when performing optimizations.

Can perform aggressive optimizations based on runtime assumptions.Can back out optimizations if an assumption is invalidated.

22

Innovation for a smarter planet

JIT Limitations

Compilation overhead is a runtime costCertain analyses are impractical to do because they are too slow

Escape analysis is only done at the highest optimization levels Whole program analysis is not done at all.

Jitted code must often branch back into the interpreter.Throwing an exception.Garbage collection points.Resolving references (i.e. triggering class loading).Calling an interpreted method from a jitted method.

23

Innovation for a smarter planet

JIT Characteristics

Compilation overhead is a runtime costCertain analyses are impractical to do because they are too slow

Escape analysis is only done at the highest optimization levels Whole program analysis is not done at all.

Jitted code must often branch back into the interpreter.Throwing an exception.Garbage collection points.Resolving references (ie triggering class loading).Calling an interpreted method from a jitted method.

24

Innovation for a smarter planet

Optimization: Devirtualization Java programs contain many virtual methods.

If a virtual method has no overrides then it may be devirtualized.Observation based on the current state of loaded classes.Removes the overhead of looking up the method implementation.Enables inlining.

Problem: dynamic class loading Its possible that at any time a class may be loaded that contains a method that

overrides a method that was devirtualized.A table of assumptions is maintained. Each assumption has a list of instructions that must be patched if the assumption is

invalidated.Patched method may get queued up for recompilation.

25

Innovation for a smarter planet

Optimization: Patching when assumption invalidated0: no-op1: fast path- call method directly2: more code3: return4: slow path- call virtual method5: branch 2

0: branch 41: fast path- call method directly2: more code3: return4: slow path- call virtual method5: branch 2

0: slow path- call virtual method1: more code2: return

Patch

Recompile