optimizing compilers cisc 673 spring 2011 dynamic compilation
DESCRIPTION
Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation. John Cavazos University of Delaware. High Level View of JVM. JVM Interpreter. Reads a bytecode from a method “Interprets” the bytecode Decodes opcode and operands Based on opcodes jumps to some C code Passes operands - PowerPoint PPT PresentationTRANSCRIPT
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Optimizing CompilersCISC 673
Spring 2011Dynamic Compilation
John CavazosUniversity of Delaware
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
High Level View of JVM
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
JVM Interpreter Reads a bytecode from a method “Interprets” the bytecode
Decodes opcode and operands Based on opcodes jumps to some C code Passes operands
Continues reading bytecodes from method until: Call Return Exception
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Interpretation Popular approach for high-level languages
Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB
Useful for memory-challenged environments
Low startup time & space overhead, but much slower than native code execution
MMI (Mixed Mode Interpreter) [Suganauma’01] Fast interpreter implemented in assembler
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Dynamic Compilation Techniques
Baseline compiler Translates bytecodes one by one to
machine code Quick compilation
Reduced set of optimizations for fast compilation
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Dynamic Compilation Techniques
Full compilation Full optimizations only for selected hot
methods Classic just-in-time compilation
Compile methods to native code on first invocation
Ex, ParcPlace Smalltalk-80, Self-91 Initial high (time & space) overhead for each
compilation Precludes use of sophisticated optimizations (eg.
SSA) Responsible for many of today’s
myths
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Interpretation vs JIT
0
20
40
60
80
100
120
Intepreter Compiler
Initial Overhead Execution
0
500
1000
1500
2000
2500
Intepreter Compiler
Execution: 20 time units Execution: 2000 time units
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Selective Optimization
Hypothesis: most execution is spent in a small percentage of methods (90/10 rule)
Idea: use two execution strategies1. Interpreter or non-optimizing compiler2. Full-fledged optimizing compiler
Strategy: Use option 1 for initial execution of all methods Profile to find “hot” subset of methods Use option 2 on this subset
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Selective Optimization
0
20
40
60
80
100
120
Intepreter Compiler Selective
Initial Overhead Execution
0
500
1000
1500
2000
2500
Intepreter Compiler Selective
Initial Overhead Execution
Selective opt: compiles 10%-20% of methods, representing 90-99% of execution time
Execution: 20 time units Execution: 2000 time units
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Designing a Selective Optimizer AKA: Adaptive Optimization System What is the system architecture?
What are the profiling mechanisms and policies for driving recompilation? How effective are these systems?
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Basic Structure of a Dynamic Compiler
ProgramMachine
code
Structural inlining
unrollingloop perm
Scalar cse
constantsexpressions
Memory scalar repl
ptrs
Reg. Alloc
Scheduling peephole
Still needs good core compiler - but more
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Raw Profile Data
Instrumented code
Basic Structure of a Dynamic Compiler
Compiler subsystem
Optimizations
Interpreter or Simple Translation
Program Executing Program
Profile Processor
History
prior decisionscompile time
ControllerCompilation
decisions
Processed Profile
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling
Counters Call Stack Sampling Combinations
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling: Counters Insert method-specific counter on method entry and loop
back edges Counts how often a method is called and approximates how
much time is spent in a method Very popular approach: Self, HotSpot Issues: overhead for incrementing counter can be
significant Not present in optimized code
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling: Counters
foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . .
}
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling: Call Stack Sampling
Periodically record which method(s) on call stack
Approximates amount of time spent in each method
Can be compiled into the code Jikes RVM, JRocket
or use hardware sampling Issues: timer-based sampling is not
deterministic
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling: Call Stack Sampling
ABC
AB
A AB
ABC
ABC
......
Sample
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling Mixed Combinations
Use counters initially and sampling later on IBM DK for Java
foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }
ABC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Recompilation Policies
Problem: given optimization candidates, which should be optimized?
Counters: Optimize method that surpass threshold Simple, but hard to tune, doesn’t
consider context Sampling: Optimize method on call
stack top Addresses context issue
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Recompilation Policies
Problem: given optimization candidates, which should be optimized?
Call Stack Sampling: Optimize all methods that are sampled
Simple to implement Use cost/benefit model
Seemingly complicated, but easy to engineer Maintenance free Naturally supports multiple optimization
levels
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Jikes RVM: Recompilation Policy – Cost/Benefit Model
Define cur, current opt level for method m Exe(j), expected future execution time at level
j Comp(j), compilation cost at opt level j
Choose j > cur that minimizes Exe(j) + Comp(j)
If Exe(j) + Comp(j) < Exe(cur) recompile at level j
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Jikes RVM: Recompilation Policy – Cost/Benefit Model
Assumptions Sample data determines how
long a method has executed Method will execute as much in
the future as it has in the past Compilation cost and speedup
are offline averages
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Optimization LevelsOptimization
Level
Opt LevelO0
Opt LevelO1
Opt LevelO2
Branch Opts Low Constant Prop / Local CSE
Reorder Code Copy Prop / Tail Recursion
Static Splitting / Branch Opt Med Simple Opts Low
While into Untils / Loop Unroll Branch Opt High / Redundant BR
Simple Opts Med / Load Elim Expression Fold / Coalesce
Global Copy Prop / Global CSE SSA
Optimizations Controlled
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Short Running Programs
No FDO, Mar’04, AIX/PPC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Short Running Programs
No FDO, Mar’04, AIX/PPC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Steady State
No FDO, Mar’04, AIX/PPC
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Steady State
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Profiling for What to Do
Myth: Sophisticated profiling is too expensive to perform online
Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Suggested ReadingDynamic Compilation
Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Spare Slides
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Method Profiling Timer Based
class Thread scheduler (...) { ... flag = 1;}void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0;}
foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . }
ABC
Useful for more than profiling Jikes RVM
Schedule garbage collection Thread scheduling policies, etc.
if (flag) handler();
if (flag) handler();
if (flag) handler();
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Arnold-Ryder [PLDI 01]: Full Duplication Profiling
Full-Duplication Framework
Duplicated CodeChecking Code
Method Entry
Checks
EntryBackedges
CheckPlacement
Generate two copies of a method• Execute “fast path” most of the time• Execute “slow path” with detailed profiling occassionally• Adapted by J9 due to proven accuracy and low overhead