cr18: advanced compilers l01 introduction tomofumi yuki

46
CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Upload: shanon-hutchinson

Post on 18-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

CR18: Advanced Compilers

L01 Introduction

Tomofumi Yuki

Page 2: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Myself

Tomofumi Yuki researcher at Inria

Ph.D. from Colorado State University in 2012 up to high school in Japan CSU for all of bachelor, masters, phd

Member of Compsys @ LIP compilers/languages automatic parallelization

2

Page 3: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

This Course

Part I: High-level (loop-level) transformations parallelism data locality

Part II: High-Level Synthesis C to hardware

3

Page 4: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Compiler Optimizations

Low-level Optimizations register allocation instruction scheduling constant propagation ...

High-level Optimizations loop transformations coarse grained parallelism ...

4

Our focus

Page 5: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

High-Level Optimizations

Goals: Parallelism and Data Locality

Why Parallelism?

Why Data Locality?

Why High-Level?

5

Page 6: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Why Loop Transformations?

The 90/10 Rule

Loop Nests hotspot of almost all programs few lines of change => huge impact natural source of parallelism

6

“90% of the execution time is spent in less than

10% of the source code”

Page 7: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Why Loop Transformations?

Which is faster?

7

for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[i][j] += A[i][k] * B[k][j];

for (i=0; i<N; i++) for (k=0; k<N; k++) for (j=0; j<N; j++) C[i][j] += A[i][k] * B[k][j];

Page 8: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Why is it Faster?

Hardware Prefetching

8

for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[i][j] += A[i][k] * B[k][j];

for (i=0; i<N; i++) for (k=0; k<N; k++) for (j=0; j<N; j++) C[i][j] += A[i][k] * B[k][j];

unchanged next col next row

unchangednext col next col

Page 9: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

How to Automate?

The most challenging part! The same optimization doesn’t work for:

Why?

9

for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) { C1[i][j] += A1[i][k] * B1[k][j]; C2[i][j] += A2[i][k] * B2[k][j]; C3[i][j] += A3[i][k] * B3[k][j]; C4[i][j] += A4[i][k] * B4[k][j];}

Page 10: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

It’s Not Just Transformations

Many many reasoning steps: What to apply? How to apply? When to apply? What is its impact?

Quality of the analysis: How long does it take? Can it potentially degrade performance? Provable properties (completeness, etc.)

10

Compiler Research is all about coming up with techniques/abstractions/representations to allowthe compiler to perform deep analysis.

Page 11: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Today’s Agenda

The Big Picture programming language compilers

Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming

Short history of polyhedral model

11

Page 12: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Compiler Advances

Old compiler vs recent compiler modern architecture different versions of gcc

How much speedup by compiler alone after 20 years of research?

12

Page 13: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Compiler Advances

Old compiler vs recent compiler modern architecture different versions of gcc 2x difference after 20 years (anecdotal)

Not so much?

13

Page 14: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Compiler Advances

Old compiler vs recent compiler modern architecture different versions of gcc 2x difference after 20 years (anecdotal)

Not so much?

14

“The most remarkable accomplishment by far of the compiler field is the widespread use of high-level languages.”

by Mary Hall, David Padua, and Keshav Pingali[Compiler Research: The Next 50 Years, 2009]

Page 15: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Placement of Compiler Research Part of Programming Languages

15

compiler

runtime systems program

verification

type theory

program synthesis

program analysis

program trans.

Page 16: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Earlier Accomplishments

Getting efficient assembly register allocation instruction scheduling ...

High-level language features object-orientation dynamic types automated memory management ...

16

Page 17: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

New twists

New machines SIMD, IBM Cell, GPGPU, Xeon-phi

New language features even Java has lambda functions now parallelism oriented features

New types of Apps smartphones, tablets

New goals energy and security

17

Page 18: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Recent research topics

Parallelism multi-cores, GPUs, ... language features for parallelism

Security/Reliability verification certified compilers

Power/Energy data movement voltage scaling

18

Page 19: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Goals of the Compiler

Higher abstraction No more writing assemblies! enables language features

loops, functions, classes, aspects, ...

Performance while increasing productivity speed, space, energy, ... compiler optimizations

19

Personal View:Compiler is there to allow lazy

programming

Page 20: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Job Market

Where do they work at? IBM Mathworks amazon start-ups Apple

Many opportunities in France Mathworks @ Grenoble Many start-ups

20

Page 21: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Today’s Agenda

The Big Picture programming language compilers

Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming

Short history of polyhedral model

21

Page 22: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Program IR

Abstract Syntax Tree basic representation within compilers

how to inspectthe AST to determineif a loop is parallel?

22

for (i in 1..N) A[i] = B[i] + 1;

NodeForiterator=i, LB=1,

UB=N

NodeAssignment

A[i]

B[i]

1

NodeBinOpop=+Not really suitable

for high-level analysis

Page 23: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Extended Graphs

Completely unroll the loops

23

for (i=0; i<5; i++) for (j=1; j<4; j++) { A[i][j] = A[i][j-1] + B[i][j]; }

A[0][1] = A[0][0] + B[0][1];A[0][2] = A[0][1] + B[0][2];A[0][3] = A[0][2] + B[0][3];A[1][1] = A[1][0] + B[1][1];A[1][2] = A[1][1] + B[1][2];A[1][3] = A[1][2] + B[1][3];

....

Page 24: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Extended Graphs

Completely unroll the loops

The difficulty: program parameters its “easy” with DAG representation scalability issues what if parameters are not known?

24

for (i=0; i<N; i++) for (j=1; j<M; j++) { A[i][j] = A[i][j-1] + B[i][j]; }

Page 25: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Iteration Spaces

Need an abstraction for statement instances

25

for (i=0; i<N; i++) for (j=1; j<M; j++) { A[i][j] = A[i][j-1] + B[i][j]; }

i

j instance = integer

vector [i,j]

space = integer set 0≤i<N and 1≤j<M

Page 26: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Lexicographic Order

Dictionary order applied to loop nests a aaa aab aba aaaa b

Compare instances (i,j) is before(i’,j’)i<i’ or i=i’ and j<j’

26

i

j

for (i=1; i<N; i++) for (j=1; j<M; j++) S0;

Page 27: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

What is the Polyhedral Model? It Depends (on who you ask)

If you ask me... Compiler Intermediate Representation

(IR) linear algebra based compact representation takes advantage of regularities

27

Page 28: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Polyhedral Representation

High-level abstraction of the program Iteration space: integer polyhedron Dependences: affine functions

Usual optimization flow 1. extract polyhedral representation 2. reason/transform the model 3. generate code in the end

28

Page 29: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Polyhedral Domains

Statements instances as integer polyhedra

Example: N2/2 instances of S0 Denoted as S0<i,j>

Represented as polyhedron {i,j|1≤i<N, 1≤j≤i} Geometric view

29

for (i=1; i<N; i++) for (j=1; j<=i; j++) S0;

i

j

i<N

j≤i

1≤j

1≤i

Page 30: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Examples (Domains)

What are the domain of these statements?

30

for (i=0; i<=N; i++) { for (j=0; 0<=M; j++) { S1; } S2;}

for (i=0; i<=N; i++) { for (j=M; j>=0; j--) { S1; }}

for (i=0; i<=N; i++) { for (j=0; j<=M; j+=2) { S1; }}

for (i=0; i<=N; i++) { for (j=0; j<=M; j++) { if (j>i) S1; }}

Page 31: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Z-Polyhedron

Polyhedron with holes intersection with lattices image of domain by affine function

Just a polyhedron in higher dimensional space

31

0<=i<=N and i%2=0

0<=i<=N and i=2j

i

j

2

1

Page 32: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Dependence Functions

Affine functions over statement instances

Dataflow (i,j→i,j+1)

Dependence (i,j→i,j-1)

32

for (i=1; i<N; i++) for (j=1; j<M; j++)S0: A[i][j] = A[i][j-1];

i

j

Page 33: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Dependence Functions

Dependences can be domain qualified

Dataflow if j=M-1

(i,j→i+1,1) else

(i,j→i,j+1)

33

for (i=1; i<N; i++) for (j=1; j<M; j++)S0: v++;

i

j

Page 34: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Composing Transformations

Key strength of the framework

35

for i for j ...

for j for i ...

for j for i’ ... for i’’ ...

T1 T2

poly poly’

loop world

abstraction

Page 35: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Parametric Analysis

Real-world code is filled with parameters code for NxM matrix, not 100x200

If the code is not parametric, and compilation time is not a big deal, it is an “easy” problem

Dealing with (potentially) infinitely different executions of a program

36

Page 36: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

What is the last iteration?

Key analysis

What is the instance that last wrote to A[k]?

Can be formulated as an ILP 0<i<N, 0<j<=i, i+j=k find lexicographically maximum k many analysis questions become ILP

for regular programs

37

for (i=1; i<N; i++) for (j=1; j<=i; j++)S0: A[i+j] = ...;

Page 37: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Parametric Integer Programming Constraints

j≤10, i+j≤10 j-i≤N i,j≥0, N>0

Objective maximize j

Parametric Solution (0,N) if N≥10 (N,N) if N<10

38

maxim

ize

j≤10

j-i≤N

i+j≤10

Page 38: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Parametric Integer Programming Constraints

j≤10, i+j≤10 j-i≤N i,j≥0, N>0

Objective maximize j

Parametric Solution (0,N) if N≥10 N-j+i≥0 (N,N) if N<10 N-j+i<0

39

maxim

ize

j≤10

j-i≤N

i+j≤10

2. Create branches for each case

1. Look at the sign of constraints

Page 39: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Today’s Agenda

The Big Picture programming language compilers

Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming

Short history of polyhedral model

40

Page 40: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

History of the Polyhedral Model Also layout for Part I of the class

Keep in mind history is not objective

41

Page 41: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Origins of the Polyhedral Model Two Starting Points

Loop program analysis Systems of recurrence equations

Loop-view is this loop parallel? what are the dependences?

Equational-view is this system of equations executable? how to find legal schedules?

42

Page 42: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Polyhedral Timeline

43

recurrence equationssystolic arrays

loop dependence analysisloop transformation

1970 1990 2000

Array Dataflow Analysis 1991

Parametric Integer Programming 1988

Scheduling

Code Generation

Memory Allocation

multi-core

GPGPU

Distributed Memory

Page 43: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Polyhedral Model: Short Story

44

Pluto(2008)

Cloog(2003)

Polylib, PIP(early 90s)

Multi-core

GPU

MPSoc

FPGA

VLSI

Automatic parallelization for shared and distributed

memory machines

Multi-dimensional Process Networks for System Level Design

Loop transformationsfor HLS

Multi-core era

Memory optimization for embedded multimedia

From a (very) subjective point of view … (originally by Steven Derrien)

Massively parallel Processor Arrays

Page 44: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Polyhedral Equational Model

Idea: Map computations to code/hardware computations specified as equations

Example: Matrix Multiply

45

for i in 0 .. P for j in 0 .. Q for k in 0 .. R C[i][j] += A[i][k] * B[k][j];

C[i,j,k] = A[i,k] * B[k,j] : if k=0 = A[i,k] * B[k,j] + C[i,j,k-1] : if k>0

C[i,j] = Σk(A[i,k]*B[k,j]);

Page 45: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

The Connection

Array Dataflow Analysis [Feautrier 1991]

convert loops to equations limited to affine loops

domain: {[i,j,k]:0≤i≤P 0≤j≤Q ∧ ∧0≤k≤R}

dependences: S0<i,j,k> → S0<i,j,k-1> dataflow: (i,j,k→i,j,k+1)

46

for i in 0 .. P for j in 0 .. Q for k in 0 .. RS0: C[i][j] += A[i][k] * B[k][j];

Page 46: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki

Next Time

Dependence Analysis Array Dataflow Analysis Legality of transformations

47