2d-profiling detecting input-dependent branches with a single input data set hyesoon kim m. aater...

Post on 22-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2D-ProfilingDetecting Input-Dependent Brancheswith a Single Input Data Set

Hyesoon KimM. Aater SulemanOnur MutluYale N. Patt

HPS Research Group The University of Texas at Austin

2

Motivation

Profile-guided code optimization has become essential for achieving good performance. Run-time behavior profile-time behavior: Good! Run-time behavior ≠ profile-time behavior: Bad!

3

Motivation

Profiling with one input set is not enough! Because a program can show different behavior with

different input data sets

Example: Performance of predicated execution is

highly dependent on the input data set

Because some branches behave differently with

different input sets

4

Input-dependent Branches

Definition A branch is input-dependent if its misprediction

rate differs by more than some Δ over different input data sets.

Inp. 1 Inp. 2 Inp.1 – Inp. 2

Misprediction rate of Br. X

30% 29% 1%

Misprediction rate of Br. Y

30% 5% 25%

Input-dependent branch

Input-dependent br. hard-to-predict br.

5

An Example Input-dependent Branch

Example from Gap (SPEC2K): Type checking branch

TypHandle Sum (A,B)

If ((type(A) == INT) && (type(B) == INT)) { Result = A + B;Return Result;

}Return SUM(A, B);

Train input set: A&B are integers 90% of the time misprediction rate: 10%

Reference input set: A&B are integers 42% of the time misprediction rate: 30% (30%-10%)>Δ

//input-dependent br

6

Predicated Execution

Eliminate hard-to-predict branchesbut fetch blocks B and C all the time

(normal branch code)

C B

D

AT N

p1 = (cond) branch p1, TARGET

mov b, 1 jmp JOIN

TARGET: mov b, 0

A

B

C

B

C

D

A

(predicated code)

A

B

C

if (cond) { b = 0;}else { b = 1;} p1 = (cond)

(!p1) mov b, 1

(p1) mov b, 0

7

Predicated Code Performance vs. Branch Misprediction Rate

Normal branch code performs better

Predicated code performs better

run-time (input B) profile-time (input A)

Converting a branch to predicated code could hurt performance if run-time misprediction rate is lower than profile-time misprediction rate

X

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Branch misprediction rate (%)

Execu

tio

n t

ime (

cycle

s)

predicated code

normal branch code

8

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr gcc mcf crafty parserperlbmk gap vortex bzip2 twolfEx

ec

. tim

e n

orm

aliz

ed

to

no

n-p

red

ica

ted

co

de

run-time: input-Arun-time: input-Brun-time: input-C

Predicated Code Performance vs. Input Set

9%-4%

-16%1%

Measured on an Itanium-II machine

non-predicated

Predicated execution loses performance because of input-dependent branches

9

If We Know a Branch is Input-Dependent

May not convert it to predicated code.

May convert it to a wish branch.

[Kim et al. Micro’05]

May not perform other compiler optimizations

or may perform them less aggressively.

Hot-path/trace/superblock-based optimizations

[Fisher’81, Pettis’90, Hwu’93, Merten’99]

10

Our Goal

Identify input-dependent branches by using a single input set for profiling

11

Talk Outline

Motivation 2D-profiling Mechanism Experimental Results Conclusion

12

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500Time (in terms of number of executed instructions x 100M)

Bra

nc

h P

red

icti

on

Ac

cu

rac

y

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500Time (in terms of number of executed instructions x 100M)

Bra

nc

h P

red

icti

on

Ac

cu

rac

y

Key Insight of 2D-profiling

Phase behavior in prediction accuracyis a good indicator of input dependence

input-dependent

input-independent

phase 1

phase 2

phase 3

13

Traditional Profiling

brA

time

brB

time

MEAN pr.Acc(brA) MEAN pr.Acc(brB)

behavior of brA behavior of brB

MEAN pr.Acc(brA)

MEAN pr.Acc(brB)

pr.

Acc

pr.

Acc

14

2D-profiling

brA

time

brB

timeMEAN pr.Acc(brA) MEAN pr.Acc(brB)

STD pr.Acc(brA) ≠ STD pr.Acc(brB)

behavior of brA ≠ behavior of brB

A: input-dependent br, B: input-independent br

MEAN pr.Acc(brA)

STD pr.Acc(brA)

MEAN pr.Acc(brB)

STD pr.Acc(brB)pr.

Acc

pr.

Acc

15

Calculate MEAN (brA, brB, …), Standard deviation (brA, brB, …),

PAM:Points Above Mean (brA, brB, …)

2D-profiling Mechanism

Slice 1 Slice 2 Slice N …

mean Pr.Acc(brA,s1)

mean Pr.Acc(brB,s1)

mean Pr.Acc(brA,s2)

mean Pr.Acc(brB,s2)

mean Pr.Acc(brA,sN)

mean Pr.Acc(brB,sN)...

...

The profiler collects branch prediction accuracy information for every static branch over time

mean brA

time

PAM:50%

PAM:0%

brA

brBmean brB

... ......

slice size = M instructions

16

Input-dependence Tests STD&PAM-test: Identify branches that have large variations

in accuracy over time (phase behavior) STD-test (STD > threshold): Identify branches that have large

variations in the prediction accuracy over time PAM-test (PAM > threshold): Filter out branches that pass STD-

test due to a few outlier samples

MEAN&PAM-test: Identify branches that have low prediction accuracy and some time-variation in accuracy MEAN-test (MEAN < threshold): Identify branches that have low

prediction accuracy PAM-test (PAM > threshold): Identify branches that have some

variation in the prediction accuracy over time

A branch is classified as input-dependent if it passes either

STD&PAM-test or MEAN&PAM-test

17

Talk Outline

Motivation 2D-profiling Mechanism Experimental Results Conclusion

18

Experimental Methodology Profiler: PIN-binary instrumentation tool Benchmarks: SPEC 2K INT Input sets

Profiler: Train input set Input-dependent Branches: Reference input set and train/other

extra input sets

Input-dependent branch: misprediction rate of the branch

changes more than Δ = 5% when input data set changes Different Δ are examined in our TechReport [reference 11].

Branch predictors Profiler: 4KB Gshare, Machine: 4KB Gshare Profiler: 4KB Gshare, Machine: 16KB Perceptron (in paper)

19

Evaluation Metrics Coverage and Accuracy for input-dependent

branches

dependent-Input Actual

PredictedCorrectly

A

BACOV

A B

dependent-InputPredicted

PredictedCorrectly

B

BACC

A

Actual Input-dependent br.

Predicted Input-dependent br. (2D-profiler)

Correctly Predicted Input-dependent br.

20

Input-dependent Branches

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

bzip2 gzip twolf gap crafty gcc% o

f b

ran

ch

es

th

at

are

inp

ut-

de

pe

nd

en

t

2 input sets3 input sets4 input sets5 input sets6 input sets7 input sets8 input sets

21

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

coverage accuracy coverage accuracy

Fra

ctio

n

2 input sets

3 input sets

4 input sets

5 input sets

6 input sets

7 input sets

8 input sets

2D-profiling Results

input-dependent branches input-independent branches

Phase behavior and input-dependence are strongly correlated!

63%

88%93%

76%

22

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

bzip2 gzip twolf gap crafty gcc AVG

No

rmal

ized

exe

cuti

on

tim

e

Edge

Gshare

2D+Gshare

The Cost of 2D-profiling?

2D-profiling adds only 1% overhead over modeling the branch predictor in software Using a H/W branch predictor [Conte’96]

54%1%

23

Conclusion 2D-profiling is a new profiling technique to find input-

dependent characteristics by using a single input data set for profiling

2D-profiling uses time-varying information instead of just average data

Phase behavior in prediction accuracy in a profile run input-dependent

2D-profiling accurately identifies input-dependent branches with very little overhead (1% more than modeling the branch predictor in the profiler)

Applications of 2D-profiling are an open research topic Better predicated code/wish branch generation algorithms Detecting other input-dependent program characteristics

Questions?

top related