communication overhead estimation on multicores

19
Communication Overhead Estimation on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz

Upload: dalila

Post on 01-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Communication Overhead Estimation on Multicores. S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz. Outline. Motivation Multicore trend Stream programming Profiling communication overhead Related works. 2. 512. PicoChip. AMBRIC. 256. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Communication Overhead Estimation on Multicores

Communication Overhead Estimation on Multicores

S. M. Farhad

The University of Sydney

Joint work with

Yousun Ko

Bernd Burgstaller

Bernhard Scholz

Page 2: Communication Overhead Estimation on Multicores

2

Outline

Motivation Multicore trend Stream programming

Profiling communication overhead Related works

2

Page 3: Communication Overhead Estimation on Multicores

3

Motivation

1

1975

2

4

8

16

32

64

128

256

512

1980 1985 1990 1995 2000 2005 2010

400480088080 8086 286 386 486 Pentium P2 P3 P4

Athlon Itanium Itanium2

Power4 PA8800400480088080

PA8800

Opteron CoreDuo

Power6Xbox 360

BCM 1480Opteron 4P

Xeon

Niagara Cell

RAW

RAZA XLR Cavium

Unicore

Homogeneous Multicore

Heterogeneous MulticoreCISCO CSR1

Larrabee

PicoChip AMBRIC

AMD Fusion

NVIDIA G80

Core

Core2Duo

Core2Quad

# co

res/

chip

Courtesy: Scott’08

C/C++/Java

CUDA

X10Peakstream

Fortress

Accelerator

Ct

C T M

Rstream

Rapidmind

Stream Programming

3

Page 4: Communication Overhead Estimation on Multicores

4

Stream Programming Paradigm Programs expressed as stream

graphs

Streams: Infinite sequence of data elements

Actors: Functions applied to streams

4

Actor

Stream

Stream

Page 5: Communication Overhead Estimation on Multicores

5

Properties of Stream Program Regular and repeating

computation Independent actors with explicit

communication Producer / Consumer

dependencies

5

Adder

Speaker

AtoD

FMDemod

LPF1

Splitter

Joiner

LPF2 LPF3

HPF1 HPF2 HPF3

Page 6: Communication Overhead Estimation on Multicores

6

StreamIt Language

An implementation of stream prog.

Hierarchical structure

Each construct has single input/output stream

parallel computation

may be any StreamIt language construct

joinersplitter

pipeline

feedback loop

joiner splitter

splitjoin

filter

6

Page 7: Communication Overhead Estimation on Multicores

How to Estimate the Communication Overhead?

7

Page 8: Communication Overhead Estimation on Multicores

Problems to Measure Communication Overhead Reasons:

Multicores are non-communication exposed architecture

Complex cache hierarchy Cache coherence protocols

Consequence: Cannot directly measure the communication cost Estimate the communication cost by measuring

the execution time of actors

8

Page 9: Communication Overhead Estimation on Multicores

Measuring the Communication Overhead of an Edge

9

i k

Processor 1

No communication cost

Processor 1

With communication cost

Processor 2

ki

kkiiki ttttC ),(

it ktit kt

Page 10: Communication Overhead Estimation on Multicores

How to Minimize the Required Number of Experiments

10

A

B

C

1

2

Pipeline

GraphColoring

Requires2+1 Exps

A

B

C

D

Processor 1 Processor 2

1

2

3

E

F

5

4

Even edgesacross partition

Processor 1

A

D

B

C

E

Processor 2

1

3

2

4

Odd edgesacross partition

Page 11: Communication Overhead Estimation on Multicores

Obs. 1: There is no loop of three actors in a stream graph

11

i k

l

Processor 1 Processor 2

Page 12: Communication Overhead Estimation on Multicores

Obs. 2: There is no interference of adjacent nodes between edges

12

A

B

C D

E

F

For blue color edges

P-1

P-2

P-3

P-4

Page 13: Communication Overhead Estimation on Multicores

Remove Interference

Convert to a line graph

Add interference edges

Use vertex coloring algorithm

13

A

B

C D

E

F

AB

BC

BDCE

DE

EF

Line graphStream graph

AB

BC

BDCE

DE

EF

Page 14: Communication Overhead Estimation on Multicores

Processor Leveling Graph

14

A

B

C D

E

F

For blue colored edge Processor leveling graph

A

B, C, D, E

F

Page 15: Communication Overhead Estimation on Multicores

Coloring the Processor Labelling Graph

15

A

B, C, D, E

F

Processor 2Processor 1

A

B, C, D, E

F

A

B, C, D, E

F

Page 16: Communication Overhead Estimation on Multicores

Measuring the Communication Cost

16

A

B

C D

E

F

A

B, C, D, E

F

Processor 2Processor 1

)()(

)()(

),(

),(

FFEEFE

BBAABA

ttttC

ttttC

At

Bt

Et

Ft

For blue colored edge

Page 17: Communication Overhead Estimation on Multicores

Profiling Performance

Benchmark Total Edge Prof Steps Steps/Edge (%) Err (%)SAR 44 3 7 10MatrixMult 88 21 24 17MergeSort 37 4 11 31FMRadio 21 3 14 24DCT 28 9 32 14RadixSort 12 2 17 5FFT 26 3 12 27MPEG 56 17 30 15Channel 22 6 27 11BeamFormer 39 5 13 13

GM 17% 15%

17

Page 18: Communication Overhead Estimation on Multicores

18

Related Works

[1] Static Scheduling of SDF Programs for DSP [Lee ‘87]

[2] StreamIt: A language for streaming applications [Thies ‘02]

[3] Phased Scheduling of Stream Programs [Thies ’03]

[4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in

Stream Programs [Thies ‘06]

[5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08]

[6] Software Pipelined Execution of Stream Programs on GPUs

[Udupa‘09]

[7] Synergistic Execution of Stream Programs on Multicores with

Accelerators [Udupa ‘09]

[8] Orchestration by approximation [Farhad ‘11]

18

Page 19: Communication Overhead Estimation on Multicores

Questions?