p arallel p rocessing i nstitute · f udan u niversity 1

37
P P ARALLEL ARALLEL P P ROCESSING ROCESSING I I NSTITUTE · NSTITUTE · F F UDAN UDAN U U NIVERSITY NIVERSITY 1

Upload: leon-nash

Post on 11-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

PPARALLEL ARALLEL PPROCESSING ROCESSING IINSTITUTE ·NSTITUTE · F FUDANUDAN UUNIVERSITYNIVERSITY

1

Page 2: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

OutlineOutline

Motivation Design & ImplementationEvaluationFuture work

2

Page 3: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

TThe popularity of Javahe popularity of Java

3

20.299%

Page 4: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Java!Java!Architecture neutralSimplified memory managementSecurity and Productivity……

4

Write Once Run Anywhere

How to further improve Java runtime performance?

Page 5: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Our ResearchOur ResearchLeverage the synergy between static and

dynamic optimizationsDynamic environment while leveraging

static benefitsFinding performance opportunities before

runtimeStatic annotation to help runtime

optimization

5

Page 6: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

OpencjOpencjIt is our first milestone in the whole projectDevelop based on Open64Takes Java source files or Class files as

inputOutputs executable code for

Linux/IA32&x86-64Compilation process is similar to compiling

C/C++ applications

6

Page 7: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

OutlineOutlineMotivationDesign & ImplementationEvaluationFuture work

7

Page 8: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Design Overview of Design Overview of OpencjOpencjMigrate frontend of gcj into Open64

8

Page 9: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Java exception handlingJava exception handlingSimilar to C++ exception, but has some

differences, such as runtime exceptions: a/0, NullPointerException No “catch-all” handler used in C++ “finally” mechanism, makes Java exception more

complex than C++ The key point of Java exception handling is to

record the relationship among try/catch/finally blocks.

9

Page 10: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Devirtualization Devirtualization Easy to reuse code for programmers but hard to

analyze for compilerResolve java virtual function call to promote

indirect call into direct callClass hierarchy analysis and Rapid type analysisDevirtualization is implemented at IPA phaseMany optimizations can benefit from this

transformation In SciMark 2.0 Java benchmark test, it can resolve

all 21 user defined virtual function calls.

10

Page 11: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Synchronization Synchronization eliminationeliminationBased on Escape Analysis

Flow-insensitive & interprocedural analysis

Connection Graph: captures the connectivity relationship among objects and object references.

Easily determine whether an object is local to a thread.

If a synchronized object is local to a thread, the synchronized operation can be removed

11

Page 12: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Building connect graphBuilding connect graphOnly five kinds of statements1. p = new P()

2. p = return_new_P()

3. p = q

4. p = q.f

5. p.f = q

12

Page 13: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Analysis processAnalysis process Intra-procedural analysis

Check every call graph node to find out whether there is a synchronized call in a PU

Set initial escape state of each reference node Inter-procedural analysis Start from main function and traverse the call

graph in depth-first order Pass escape states between caller and callee

13

Page 14: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Example 1Example 1

14

GlobalEscape

OutEscape

GlobalEscape

NoEscape

OutEscape

Page 15: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Example 1Example 1

15

GlobalEscape

NoEscape

GlobalEscape

NoEscape

Page 16: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Example2Example2

16

GlobalEscape

ArgEscape

ArgEscape

NoEscape

GlobalEscape

Page 17: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Example2Example2

17

NoEscape

GlobalEscape

GlobalEscape

NoEscape

Page 18: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Array bounds check eliminationArray bounds check eliminationArray bounds check to guarantee Java type-

safe executionPrevent many useful code optimizations

since array bounds check may raise exceptions

Fully elimination: if the check never failsPartial elimination: whenever possible,

moves bounds check out of loops

18

Page 19: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Example of ABCEExample of ABCE

19

Page 20: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Fully redundant check Fully redundant check eliminationeliminationExample

20

0<=i1<100

jc1

Page 21: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Fully redundant check Fully redundant check eliminationeliminationExample

21

Page 22: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Partial eliminationPartial eliminationAdopting loop

versioning technique to guarantee the exception semantic for Java

Set trigger conditions before and after the optimized loop

22

Page 23: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Example

Partial redundant check Partial redundant check eliminationelimination

23

Page 24: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Checks elimination of Checks elimination of ABCEABCE

24

Total: the total number checks in the test casePRCE: the number of Partial Redundant Check EliminationFRCE: the number of Fully Redundant Check EliminationABCE: FPCE+PRCE28.4% speedup in Scimark2 test, lower than we expected

Page 25: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

OutlineOutline

MotivationDesign & ImplementationEvaluationFuture work

25

Page 26: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Performance gap between Performance gap between Java & CJava & C

26

opencj -O3 -IPA -fno-bounds-check opencc -O3 -IPA gcj -O3 -fno-bounds-check -funroll-loops gcc -O3 -funroll-loops

higher is better

Page 27: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Static compilation Static compilation vsvs JIT JIT

27

higher is better

Comparing two Java running modes. Running in JVM Running executable file directly

Page 28: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Static compilation Static compilation vsvs JIT JIT

28

lower is better

JDK 1.6 is best except mpegaudio More analysis work need to do.

Page 29: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

OutlineOutline

MotivationDesign & ImplementationEvaluationFuture work

29

Page 30: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Future Trends – for JavaFuture Trends – for JavaWhere is Java headed with its dynamic

optimization framework: Exploring opportunities to achieve performance

parity with native code Online profiling mechanisms and feedback-

directed optimizations becoming mainstream …

30

Page 31: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Java advantagesJava advantagesSeveral studies show that Java could

potentially be faster than C/C++ for some reasons: C/C++ Pointers make optimization difficult It is easier to do memory management in Java

than C/C++ as Java only allocates memory through object instantiation. So Java garbage collectors can achieve better cache coherence

Dynamic compilation of Java can use additional information available at run-time to optimize code more effectively.

31

Page 32: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Future of OpencjFuture of OpencjOpencj will achieve better runtime performance by

using JVM as the execution environment Static annotation with annotation-aware JIT - Runtime IPA

Using just-in-time compiler - Apply more effective optimizations by profiling run-

time information

Using garbage collection - Better performance due to cache coherence

There are three steps in our schedule

32

Page 33: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Framework---step1Framework---step1

33

C/C++/F .java

IPL

IPA

BE (LNO, WOPT)

CG

x86 IA LWHIRL

.class

LIR ACTIONS

JIT Interp

runtimelibrary

WHIRL Reader

Whirl_to_LIR

HIR ACTIONS

Byte Code Reader

FE FE

IR Writer

Existing Module

New Module

C/C++

Page 34: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Framework—step2Framework—step2

34

C/C++/F .java

IPL

RIPA

BE (LNO, WOPT)

CG

x86 IA LWHIRL

.class

LIR ACTIONS

JIT Interp

runtimelibrary

WHIRL Reader

Whirl_to_LIR

HIR ACTIONS

Byte Code Reader

FE FE

IR Writer

Existing Module

New Module

C/C++

RIPA IR

Page 35: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

Framework---finalFramework---final

35

C/C++/F .java

IPL

RIPA

BE (LNO, WOPT)

CG

x86 IA LWHIRL

.class

LIR ACTIONS

JIT Interp

runtimelibrary

WHIRL Reader

W to LIR

HIR ACTIONS

Byte Code Reader

FE FE

IR Writer

Existing Module

New Module

C/C++

RIPA IR

HWHIRL

Runtime OPT.

Feedback

Page 36: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

DiscussionDiscussionShin is the leader of this projectQ&A

36

Page 37: P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1

PPARALLEL ARALLEL PPROCESSING ROCESSING IINSTITUTE ·NSTITUTE · F FUDANUDAN UUNIVERSITYNIVERSITY

37