quda query- d a - github pages · 2020. 7. 15. · query-directed adaptive heap cloning for...

1
Query-Directed Adaptive Heap Cloning for Optimising Compilers Heap Cloning buffer1=getmem() buffer2=getmem() return malloc return malloc buffer1 buffer2 heap int main(){ int* buffer1 = getMem(); int* buffer2 = getMem(); } int* getMem(){ return malloc(size); } buffer1 buffer2 heap1 heap heap2 main getMem main getMem main getMem getMem C program Code Program execution Static pointer analysis without heap cloning Static pointer analysis with heap cloning Compiler Optimisations Front End Static Analysis Source Code Compiler Optimisations Code Generation Executable Program Heap Cloning A Typical Compiler Workflow A total of 496304 alias queries generated from optimisations in Open64 compiler for analysing CPU2000 C benchmarks Motivation V ery costly to reason about full heap cloning on large size program with high-density call graph 50% 55% 60% 65% 70% 75% 80% 0 >3 k-Callsite Sensitivity Must-not AliasQueries (%) k-Callsite Sensitivity 175.vpr 29% 30% 30% 31% 31% 32% 32% 33% 33% 34% 0 1 2 3 >3 Must-not AliasQueries (%) k-Callsite Sensitivity 176.gcc 20% 21% 22% 23% 24% 25% 26% 27% 28% 0 1 2 3 >3 Must-not AliasQueries (%) k-Callsite Sensitivity 255.vortex 0% 10% 20% 30% 40% 50% 60% 70% 80% 0 1 2 3 >3 Must-not AliasQueries (%) k-Callsite Sensitivity k-Callsite Sensitivity 300.twolf Compiler optimisations of diverse programs benefit in different precision levels of heap cloning Different programs require different levels of precision ! Full heap cloning is overkill ! and Approach Alias Queries Cloning Level k k++ Heap-Aware Pointer Solver Selecting Candidate Heap Objects Adaptive Update Enable Iterative heap cloning only where it is necessary according to queries ! QUDA: QUery- Directed Adaptive heap cloning QUDA performs heap cloning on parts of the program according to clients' need in a demand-driven fashion An Example Alias Query is not answered positively procedure heap object k=0 k=1 k=2 k=3 <*p,*q> Alias Query may-alias procedure heap object k=0 k=1 k=2 k=3 <*p,*q> Alias Query p p q q may-alias procedure heap object k=0 k=1 k=2 k=3 <*p,*q> Alias Query may-alias procedure heap object k=0 k=1 k=2 k=3 <*p,*q> Alias Query must-not-alias procedure heap object k=0 k=1 k=2 k=3 <*p,*q> Alias Query p p q q procedure heap object k=0 k=1 k=2 k=3 <*p,*q> Alias Query All done Cannot be cloned further Alias Query is not answered positively Alias Query is not answered positively Alias Query is answered positively Alias Query is answered positively Alias Query is answered positively p q must-not-alias QUDA heap cloning (1st Iteration) QUDA performs k-callsite-sensitive heap cloning iteratively, starting with k = 0 (without heap cloning), so that an abstract heap object is cloned at iteration k = i + 1 only if some alias queries that are not answered positively at iteration k = i may now be answered more precisely. QUDA heap cloning (2nd Iteration) QUDA heap cloning (3rd Iteration) Results 0 2 4 6 8 10 12 14 1 2 3 4 5 6 gcc hmmer jpeg mesa perlbmk rasta sendmail vortex % of Analysis Time k-Callsite-Sensitivity Heap Cloning 0 20 40 60 80 100 ammp crafty gap gcc hmmer icecast jpeg mesa parser perlbmk rasta sendmail twolf vortex vpr average Reduction (%) 0 2 4 6 8 10 12 14 ammp crafty gap gcc hmmer icecast jpeg mesa parser perlbmk rasta sendmail twolf vortex vpr average Fulcra QUDA Analysis Overhead 16x 22x 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 gcc hmmer jpeg mesa perlbmk rasta sendmail vortex k-Callsite-Sensitivity Heap Cloning % of Queries to Be Answered Analysis overhead compared to Fulcra (state-of-the-art) Heap objects reduced by QUDA over Fulcra QUDA analysis time per iteration over the total QUDA alias queries to be answered at each iteration Tool Framework QUDA Framework in the Open64 Compiler Front End Heap-Aware Pointer Analysis WOPT Code Generation Constraint Graphs Candidate Query Selection Top-Down Bottom-up Adaptive Update ELF files (*.G, *.I) write into read Alias Tag Cloning Level K++ Source Code Executable Program QUDA is implemented in industry-strength compiler Open64 scales up to 200K lines of code. It has the same precision as the state-of-the-art, but is significantly more scalable. For 10 SPEC2000 C benchmarks and 5 C applications (totalling 840 KLOC) evaluated, QUDA takes only 4+ minutes but exhaustive heap cloning takes 42+ minutes to complete.

Upload: others

Post on 21-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QUDA QUery- D A - GitHub Pages · 2020. 7. 15. · Query-Directed Adaptive Heap Cloning for Optimising Compilers Heap Cloning Very costly to reason about full heap cloning on large

Query-Directed Adaptive Heap Cloning for Optimising Compilers

Heap Cloning

buffer1=getmem()

buffer2=getmem()

return malloc

return malloc

buffer1

buffer2

heap

int main(){ int* buffer1 = getMem(); int* buffer2 = getMem();}

int* getMem(){ return malloc(size);}

buffer1

buffer2

heap1

heap

heap2

main getMem main getMem

main getMem

getMem

C program Code Program execution

Static pointer analysis without heap cloning Static pointer analysis with heap cloning

Compiler Optimisations

Front End

Static Analysis

Source Code

Compiler Optimisations

CodeGeneration

Executable Program

Heap Cloning

A Typical Compiler Workflow

A total of 496304 alias queries generated from optimisations in Open64 compiler for analysing CPU2000 C benchmarks

Motivation

Very costly to reason about full heap cloning on large size program with high-density call graph

50%

55%

60%

65%

70%

75%

80%

0 1 2 3 >3 Mus

t-no

t Alia

sQue

ries

(%)

k-Callsite Sensitivity

Mus

t-no

t Alia

sQue

ries

(%)

k-Callsite Sensitivity

175.vpr

29%

30%

30%

31%

31%

32%

32%

33%

33%

34%

0 1 2 3 >3

Mus

t-no

t Alia

sQue

ries

(%)

k-Callsite Sensitivity

Mus

t-no

t Alia

sQue

ries

(%)

k-Callsite Sensitivity

176.gcc

20%

21%

22%

23%

24%

25%

26%

27%

28%

0 1 2 3 >3

Mus

t-no

t Alia

sQue

ries

(%)

k-Callsite Sensitivity

255.vortex

0%

10%

20%

30%

40%

50%

60%

70%

80%

0 1 2 3 >3

Mus

t-no

t Alia

sQue

ries

(%)

k-Callsite Sensitivity k-Callsite Sensitivity

300.twolf

Compiler optimisations of diverse programs benefit in different precision levels of heap cloning

Different programsrequire different

levels of precision !

Full heap cloning is overkill !

and

Approach

Alias Queries

Cloning Level k

k++

Heap-Aware Pointer Solver

Selecting Candidate Heap Objects

AdaptiveUpdate

Enable Iterative heap cloning only where it is necessaryaccording toqueries !

QUDA: QUery-Directed Adaptive heap cloning

QUDA performs heap cloning on parts of the program according to clients' need

in a demand-driven fashion

An Example

Alias Query is not answered positively

procedureheap object

k=0

k=1

k=2

k=3

<*p,*q>Alias Query

may-alias

procedureheap object

k=0

k=1

k=2

k=3

<*p,*q>Alias Queryp

p

q

q may-alias

procedureheap object

k=0

k=1

k=2

k=3

<*p,*q>Alias Query

may-alias

procedureheap object

k=0

k=1

k=2

k=3

<*p,*q>Alias Query

must-not-alias

procedureheap object

k=0

k=1

k=2

k=3

<*p,*q>Alias Queryp

p

q

q

procedureheap object

k=0

k=1

k=2

k=3

<*p,*q>Alias Query

All doneCannot be

cloned further

Alias Query is not answered positively Alias Query is not answered positively

Alias Query is answered positively Alias Query is answered positively Alias Query is answered positively

pq

must-not-alias

QUDA heap cloning (1st Iteration)QUDA performs k-callsite-sensitive heap cloning iteratively, starting with k = 0 (without heap cloning), so that an abstract heap object is cloned at iteration k = i + 1 only if some alias queries that are not answered positively at iteration k = i may now be answered more precisely.

QUDA heap cloning (2nd Iteration) QUDA heap cloning (3rd Iteration)

Results

0

2

4

6

8

10

12

14

1 2 3 4 5 6

gcc hmmer jpeg mesa perlbmk rasta sendmail vortex %

of A

naly

sis

Tim

e

k-Callsite-Sensitivity Heap Cloning

0

20

40

60

80

100

ammp

craft

y gap

gcc

hmmer

iceca

st jpeg

mesa

parse

r

perlbmk

rasta

sendmail

twolf

vorte

x vp

r

aver

age

Red

uctio

n (%

)

0 2 4 6 8

10 12 14

ammp

crafty

gap

gcc

hmmer

iceca

st jpeg

mesa

parser

perlbmk

rasta

sendmail

tw

olf

vorte

x vp

r

avera

ge

Fulcra QUDA

Ana

lysi

s O

verh

ead

16x22x

0 10 20 30 40 50 60 70 80 90

100

1 2 3 4 5 6

gcc

hmmer

jpeg

mesa

perlbmk

rasta

sendmail

vortex

k-Callsite-Sensitivity Heap Cloning

% o

f Que

ries

to B

e An

swer

ed

Analysis overhead compared to Fulcra (state-of-the-art) Heap objects reduced by QUDA over Fulcra

QUDA analysis time per iteration over the total QUDA alias queries to be answered at each iteration

Tool Framework

QUDA Framework in the Open64 Compiler

Front End

Heap-Aware Pointer Analysis

WOPT CodeGeneration

ConstraintGraphs

Candidate Query Selection

Top-Down Bottom-up

Adaptive Update

ELF files(*.G, *.I)

write into

read

Alias Tag

Cloning Level K++

SourceCode

Executable Program

QUDA is implemented in industry-strength compiler Open64 scales up to 200K lines of code. It has the same precision as the state-of-the-art, but is significantly more scalable.

For 10 SPEC2000 C benchmarks and 5 C applications (totalling 840 KLOC) evaluated, QUDA takes only 4+ minutes but exhaustive heap cloning takes 42+ minutes to complete.