efficient program compilation through machine learning techniques gennady pekhimenko ibm canada...

Post on 29-Mar-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Efficient Program Compilation through Machine Learning Techniques

Gennady PekhimenkoIBM Canada

Angela Demke BrownUniversity of Toronto

MotivationMy cool program

Compiler-O2

DCE

Peephole

Unroll

InlineExecutable

But what to do if executable is slow?

Replace –O2 with –O5

UnrollUnroll

UnrollUnroll

UnrollUnroll

Optimization 100New

FastExecutable

1-10 minutes

few seconds

Unroll

Inline

Peephole

DCE

Motivation (2)

Compiler-O2

Our cool OperatingSystem

1 hourExecutable

Too slow!

Compiler-O5

20 hours

NewExecutable

We do not have that much time

Why did it happen?

Basic Idea

UnrollUnroll

UnrollOptimization 100

Do we need all these optimizations for every function?

Probably not.

Compiler writers can typically solve this problem, but how ?

1. Description of every function2. Classification based on the description3. Only certain optimizations for every class

Machine Learning is good for solving this kind of problems

Overview

Motivation System Overview Experiments and Results Related Work Conclusions Future Work

Initial Experiment

3X difference on average

Initial Experiment (2)

0100200300400500

bzip

2cr

afty

eon

gap

gzip

mcf

vort

ex vpr

amm

pap

plu

art

equa

kefa

cere

cfm

a3d

galg

ellu

cas

mes

am

grid

sixtr

ack

swim

wup

wise

Time, secs

Benchmarks

SPEC2000 execution time at –O3 and –qhot –O3

"-O3""-qhot -O3"

Classification parameters

Our SystemPrepare

• extract features• modify heuristic

values• choose

transformations • find hot methods

Gather Training Data

Compile Measure run time

LearnLogistic Regression Classifier

Best feature settingsOffline

Deploy

TPO/XL Compilerset heuristic values

Online

Data Preparation

Three key elements: Feature extraction Heuristic values modification Target set of transformations

• Total # of insts• Loop nest level• # and % of Loads, Stores,

Branches• Loop characteristics• Float and Integer # and %

• Existing XL compiler is missing functionality

• Extension was made to the existing Heuristic Context approach

• Unroll • Wandwaving• If-conversion• Unswitching• CSE• Index Splitting ….

Gather Training Data Try to “cut” transformation backwards (from

last to first)

If run time not worse than before, transformation can be skipped

Otherwise we keep it We do this for every hot function of every testThe main benefit is linear complexity.

Late Inlining Unroll Wandwaving

Learn with Logistic Regression

Function Descriptions

Best Heuristic Values

Input Classifier

• Logistic Regression• Neural Networks• Genetic

Programming

Output.hpredictfiles

Compiler +Heuristic Values

Deployment

Online phase, for every function: Calculate the feature vector Compute the prediction Use this prediction as heuristic context

Overhead is negligible

Overview

Motivation System Overview Experiments and Results Related Work Conclusions Future Work

Experiments

Benchmarks: SPEC2000 Others from IBM customers

Platform: IBM server, 4 x Power51.9 GHz, 32GB RAMRunning AIX 5.3

Results: compilation time

00.10.20.30.40.50.60.70.80.9

1bz

ip2

craft

yeo

nga

pgz

ipm

cfvo

rtex vp

ram

mp

appl

uar

teq

uake

face

rec

fma3

dga

lgel

luca

sm

esa

mgr

idsi

xtra

cksw

imw

upw

ise

Geo

Mea

n

Normalized Time

Benchmarks

Oracle

Classifer

2x average speedup

Results: execution time

0

50

100

150

200

250

300

350bz

ip2

craft

yeo

nga

pgz

ipm

cfvo

rtex vp

ram

mp

appl

uar

teq

uake

face

rec

fma3

dga

lgel

luca

sm

esa

mgr

idsi

xtra

cksw

imw

upw

ise

Time, secs

Benchmarks

Baseline

Oracle

Classifer

New benchmarks: compilation time

0

0.2

0.4

0.6

0.8

1

Normalized Time

Benchmarks

Classifier

New benchmarks: execution time

0

50

100

150

200

250

300

350

apsi parser twolf dmo argonne

Time, secs

Benchmarks

Baseline

Classifer

4% speedup

Overview

Motivation System Overview Experiments and Results Related Work Conclusions Future Work

Related Work Iterative Compilation

Pan and Eigenmann Agakov, et al.

Single Heuristic Tuning Calder, et al. Stephenson, et al.

Multiple Heuristic Tuning Cavazos, et al. MILEPOST GCC

Conclusions and Future Work 2x average compile time decrease Future work

Execution time improvement -O5 level Performance Counters for better method

description Other benefits

Heuristic Context Infrastructure Bug Finding

Thank you

Raul Silvera, Arie Tal, Greg Steffan, Mathew Zaleski

Questions?

top related