llvm performance workshop cgo 2017 · llvm performance workshop cgo 2017 efficient clustering of...

16
LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian Pop Samsung Austin R&D Center {e.menezes, aditya.k7, s.pop}@samsung.com February 4, 2017 Austin, TX

Upload: trinhnhi

Post on 12-Aug-2019

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

LLVM Performance WorkshopCGO 2017

Efficient clustering of case statements for indirect branch prediction

Evandro Menezes, Aditya Kumar, Sebastian PopSamsung Austin R&D Center

{e.menezes, aditya.k7, s.pop}@samsung.com

February 4, 2017Austin, TX

Page 2: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

Image: "Generic 4 stage pipeline". Wikimedia Commons 2

Branch Prediction In super­pipelined processors, there is a significant lag between 

the beginning of the execution of  an instruction and when its result becomes available.

Page 3: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  3

Branch Prediction

This trait also applies to conditional direct branches However, waiting for the result of the instructions affecting 

a conditional branch is wasteful. Therefore, the processor makes an educated guess as to 

where the conditional branch will lead to. If the guess is wrong, it all goes to waste.

Conditional direct branch prediction techniques aim at increasing the frequency when the processor makes the right guesses.

Page 4: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  4

Indirect Branch Prediction Like conditional direct branches, indirect branches may lead to 

more than one target. Unlike conditional direct branches, which may lead to just two 

targets, indirect branches may lead to multiple targets. Therefore, indirect branch prediction techniques  have traditionally 

been less efficient than for conditional direct branch. Moreover, because of multiple possible targets, predicting indirect 

branches needs more resources than predicting conditional direct branches, leading to more implementation compromises and limitations, depending on the transistor budget.

Page 5: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  5

Jump Tables

In the presence of a large number of conditional cases, it is more efficient to use a table of targets together with indirect branches. Advantages: simple execution, compact code size. Disadvantages: reliance on efficient indirect branch 

prediction. Moreover, with the increased popularity of interpreted 

languages and object oriented languages, many programs rely heavily on jump tables.

Page 6: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  6

Jump Tables in LLVM The previous algorithm used to generate jump tables in 

LLVM, SelectionDAGBuilder::findJumpTables(), by Hans Wennborg, was based on the work by Kannan & Proebsting. Minimizes the number of partitions of clusters of cases by 

maximizing their sizes. O(n2)

However, the resulting jump tables have virtually unlimited size, which may overwhelm the processor resources dedicated to indirect branch prediction.

Page 7: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  7

Jump Tables in LLVM The algorithm in SelectionDAGBuilder::findJumpTables() was 

modified to optionally limit the size of partitions of clusters.– Suiting them to the limitations of the indirect branch predictor in the target.

– The default, unlimited size, yields the same results as before. Moreover, since conditional direct prediction tends to be more accurate 

than indirect branch prediction, it favors conditional direct branches over sparse clusters with few cases.

– Maximizes the number of partitions of clusters by limiting their sizes.

– Maximizes the density of clusters of cases.• Fewer sparse partitions, falling back to conditional direct branches.

– O(n log n)

Page 8: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  8

Jump Tables in LLVMdeclare void @g(i32)

define void @test(i32 %x) {

entry:

switch i32 %x, label %return [

i32 1, label %bb1

i32 2, label %bb2

i32 3, label %bb3

i32 4, label %bb4

i32 14, label %bb14

i32 15, label %bb15

]

return: ret void

bb1: tail call void @g(i32 1) br label %return

bb2: tail call void @g(i32 2) br label %return

bb3: tail call void @g(i32 3) br label %return

bb4: tail call void @g(i32 4) br label %return

bb14: tail call void @g(i32 5) br label %return

bb15: tail call void @g(i32 6) br label %return

}

Page 9: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  9

Jump Tables in LLVMdeclare void @g(i32)

define void @test(i32 %x) {

entry:

switch i32 %x, label %return [

i32 1, label %bb1

i32 2, label %bb2

i32 3, label %bb3

i32 4, label %bb4

i32 14, label %bb14

i32 15, label %bb15

]

return: ret void

bb1: tail call void @g(i32 1) br label %return

bb2: tail call void @g(i32 2) br label %return

bb3: tail call void @g(i32 3) br label %return

bb4: tail call void @g(i32 4) br label %return

bb14: tail call void @g(i32 5) br label %return

bb15: tail call void @g(i32 6) br label %return

}

Page 10: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  10

Results: Samsung Exynos M1

SPECint CP U2000

164.gzip175.vpr

176.gcc181.mcf

186.crafty197.parser

252.eon253.perlbmk

254.gap255.vortex

256.bzip2300.twolf

SPECint2000

0.94

0.96

0.98

1

1.02

1.04

1.06

1.08

1.1

1.12

SPECint2000

sub­title

8

Page 11: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  11

Results: Samsung Exynos M1

SPECint CP U2000

LZMA

JPEG

Lua

0.97 0.98 0.99 1 1.01 1.02 1.03 1.04 1.05 1.06

Other Benchmarks

8

Page 12: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

Minus 252.eon 12

Results: ARM Cortex A57

164.gzip175.vpr

176.gcc181.mcf

186.crafty197.parser

253.perlbmk254.gap

255.vortex256.bzip2

300.twolfSPECint2000

94%

96%

98%

100%

102%

104%

106%

SPECint2000

sub­title

8

16

32

64

Page 13: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  13

Results: ARM Cortex A57

LZMA

JPEG

Lua

92% 94% 96% 98% 100% 102% 104% 106%

Other Benchmarks

sub­title

8

16

32

64

Page 14: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

Minus 175.vpr and 252.eon 14

Results: Intel i7 Skylake

164.gzip 176.gcc 181.mcf 186.crafty 197.parser 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf SPECint200090%

92%

94%

96%

98%

100%

102%

104%

106%

SPECint2000

sub­title

8

16

32

64

Page 15: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  15

ConclusionThe results depend largely on the implementation of the indirect branch predictor in the processor and, of course, on the workload.

In particular, though a jump table may have a large number of entries, only a few of them may be used in a given workload, possibly within the limitations of a particular indirect branch predictor.

Give it a try!-mllvm -min-jump-table-entries=<entries>

-mllvm -max-jump-table-size=<entries>

Page 16: LLVM Performance Workshop CGO 2017 · LLVM Performance Workshop CGO 2017 Efficient clustering of case statements for indirect branch prediction Evandro Menezes, Aditya Kumar, Sebastian

  16

Referencesllvm::SelectionDAGBuilder

https://reviews.llvm.org/D21940

https://reviews.llvm.org/D25212

Kannan; Proebsting. “Correction to ‘producing good code for the case statement’”, 1994.

Lee. “ECE 570 High Performance Computer Architecture: Dynamic Branch Prediction”, Oregon State.

Kim; João; Mutlu; Lee; Patt; Cohn. “VPC Prediction: Reducing the Cost of Indirect Branches via Hardware­Based Dynamic Devirtualization”, 2007.