spatial computation computing without general-purpose processors mihai budiu [email protected]...

72
Spatial Computation Computing without General-Purpose Processors Mihai Budiu [email protected] Carnegie Mellon University July 8, 2004

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

Spatial ComputationComputing without General-Purpose Processors

Mihai [email protected]

Carnegie Mellon University

July 8, 2004

Page 2: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

2

Mihai [email protected]

Carnegie Mellon University

Spatial Computation

A computation model based on:

• application-specific hardware

• no interpretation

• minimal resource sharing

Spatial Computation

Page 3: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

3

The Engine Behind This Talk

main( )

{

signal(SIGINT, welcome);

while (slides( ) && time( )) {

talk( );

}

}

Page 4: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

4

Research Scope

Object: future architectures

Tool:compilers

Evaluation:simulators

Page 5: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

5

Research Methodology

Constraint Space

state-of-the-art

X (e.g., power)

Y (e.g., cost)

“reasonable limits”

incrementalevolution

new solutions

Page 6: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

6

Outline• Introduction: problems of current architectures

• Compiling Application-Specific Hardware

• Pipelining

• ASH Evaluation

• Conclusions

1000

Per

form

ance

1

10

100

19

80

19

84

19

86

19

88

19

90

19

92

19

94

19

96

19

98

20

00

19

82

Page 7: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

7

Resources

• We do not worry about not having hardware resources• We worry about being able to use hardware resources

[Intel]

Page 8: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

8

Design Complexity1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2003

2001

2005

2007

2009

Designer productivity

104

Chip size

105

106

107

108

109

1010

Tra

nsis

tors

Page 9: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

9

Communication vs. Computation

5ps 20ps

gate wire

Power consumption on wires is also dominant

Page 10: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

10

Power Consumption

Toasted CPU: about 2 sec after removing cooler.

(Tom’s Hardware Guide)

Page 11: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

11

Energy Efficiency

ALUs

Pentium 4

Page 12: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

12

Clock Speed

Cannot rely on global signals(clock is a global signal)

3GHz

6GHz

10GHz

Page 13: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

13

Instruction-Set Architecture

Software

Hardware

ISA

VERY rigid to changes(e.g. x86 vs Itanium)

Page 14: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

14

Our Proposal• ASH addresses these problems• ASH is not a panacea• ASH “complementary” to CPU

High-ILPcomputation

Low ILP computation+ OS + VM CPU ASH

Memory

$

Page 15: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

15

Outline

• Problems of current architectures

• CASH: Compiling ASH– program representation– compiling C programs

• Pipelining

• ASH Evaluation

• Conclusions

Page 16: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

16

Application-Specific HardwareC program

Compiler

Dataflow IR

Reconfigurable/custom hw

SW

HW

ISA

HW backend

Page 17: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

17

Application-Specific HardwareC program

Compiler

Dataflow IR

CPU [predication]

SW backend

Soft

Page 18: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

18

...

def-use

may-dep.

Key: Intermediate Representation

Traditionally

• SSA + predication + speculation

• Uniform for scalars and memory

• Explicitly encodes may-depend

• Executable

• Precise semantics

• Dataflow IR

• Close to asynchronous target

Our IR

CFG

Page 19: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

19

Computation = Dataflow

• Operations ) functional units• Variables ) wires• No interpretation

x = a & 7;...

y = x >> 2;

Programs

&

a 7

>>

2

x

Circuits

Page 20: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

20

Basic Computation

+data

valid

ack

latch

Page 21: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

21

+

Asynchronous Computation

data

valid

ack

1

+

2

+

3

+

4

+

8

+

7

+

6

+

5

latch

Page 22: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

22

Distributed Control Logic

+ -

ackrdy

global

FSM

asynchronous control

short, local wires

Page 23: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

23

Outline

• Problems of current architectures

• CASH: Compiling ASH– program representation– compiling C programs

• Pipelining

• ASH Evaluation

• Conclusions

Page 24: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

24

MUX: Forward Branches

if (x > 0) y = -x;

elsey = b*x;

*

x

b 0

y

!

- >

Conditionals ) Speculation critical path

SSA= no arbitration

Page 25: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

25

Control Flow ) Data Flow

datapredicate

Merge (label)

Gateway

data

data

Split (branch)p

!

Page 26: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

26

i

+1< 100

0

*

+

sum

0

Loops

int sum=0, i;

for (i=0; i < 100; i++)

sum += i*i;

return sum;return sum; !

ret

Page 27: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

27

no speculation

sequencingof side-effects

Predication and Side-Effects

Load

addr

data

pred

token

token

tomemory

Page 28: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

28

Memory Access

LD

ST

LD

MonolithicMemory

local communication global structures

pipelinedarbitratednetwork

Future work: fragment this!related workcomplexity

Page 29: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

29

CASH Optimizations

• SSA-based optimizations– unreachable/dead code, gcse, strength reduction,

loop-invariant code motion, software pipelining, reassociation, algebraic simplifications, induction variable optimizations, loop unrolling, inlining

• Memory optimizations– dependence & alias analysis, register promotion,

redundant load/store elimination, memory access pipelining, loop decoupling

• Boolean optimizations– Espresso CAD tool, bitwidth analysis

Page 30: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

30

Outline• Problems of current architectures

• Compiling ASH

• Pipelining

• Evaluation: CASH vs. clocked designs

• Conclusions

Page 31: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

31

Pipeliningi

+

<=

100

1

*

+

sum

pipelinedmultiplier(8 stages)

int sum=0, i;

for (i=0; i < 100; i++)

sum += i*i;

return sum;

step 1

Page 32: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

32

Pipeliningi

+

<=

100

1

*

+

sum

step 2

Page 33: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

33

Pipeliningi

+

<=

100

1

*

+

sum

step 3

Page 34: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

34

Pipeliningi

+

<=

100

1

*

+

sum

step 4

Page 35: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

35

Pipeliningi

+

<=

100

1

i=1

i=0

+

sum

step 5

Page 36: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

36

Pipeliningi

+

<=

100

1

*i=1

i=0

+

sum

step 6

Page 37: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

37

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

Longlatency pipe

predicate

step 7

Page 38: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

38

Predicate ackedge is on thecritical path.

Pipeliningi

+

<=

100

1

*

+

sum

critical pathi’s loop

sum’s loop

Page 39: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

39

Pipeline balancing i

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

decouplingFIFO

step 7

Page 40: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

40

Pipeline balancing i

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

critical path

decouplingFIFO

Page 41: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

41

Outline• Problems of current architectures

• Compiling ASH

• Pipelining

• Evaluation: CASH vs. clocked designs

• Conclusions

Page 42: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

42

Evaluating ASHC

CASHcore

Verilog back-end

Synopsys,Cadence P/R

ASIC

180nm std. cell library, 2V

~1999technology

Mediabench kernels(1 hot function/benchmark)

ModelSim(Verilog simulation)

performancenumbers

Mem

Page 43: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

43

ASH AreaP4: 217

normalized area

minimal RISC core

0

1

2

3

4

5

6

7

8

adpc

m_d

adpc

m_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

Sq

uar

e m

m

Mem accessDatapath

Page 44: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

44

ASH vs 600MHz CPU [.18 m]

1.08

1.61

0.45 0.45

2.19

1.17

1.731.62

1.91

1.65

3.76

3.51

1.48

0

0.5

1

1.5

2

2.5

3

3.5

4

adpc

m_d

adpc

m_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

avg

Tim

es

slo

we

r

Page 45: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

45

Bottleneck: Memory Protocol

LD

ST Memory

•Token release to dependents: requires round-trip to memory.•Limit study: round trip zero time ) up to 6x speed-up.

LSQ

•Exploring protocol for in-order data delivery & fast token release.

Page 46: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

46

PowerDSP110

mP4000

Xeon [+cache]67000

0

5

10

15

20

25

30

adpc

m_d

adpc

m_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg_

d

jpeg_

e

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

Po

we

r [m

W]

Page 47: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

47

Energy Efficiency

0.01 0.1 1 10 100 1000

Energy Efficiency [Operations/nJ]

General-purpose DSP

Dedicated hardware

ASH media kernels

Asynchronous P

Microprocessors

1000x

FPGAs

Page 48: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

48

Outline

Problems of current architectures

+ Compiling ASH

+ Pipelining

+ ASH Evaluation

= Future/related work & conclusions

Page 49: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

49

Related Work

NanotechnologyDataflowmachines

High-levelsynthesis

Reconfigurablecomputing

Computerarchitecture

Embeddedsystems

Asynchronouscircuits

Compilation

Page 50: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

50

Future Work• Optimizations for

area/speed/power

• Memory partitioning

• Concurrency

• Compiler-guided layout

• Explore extensible ISAs

• Hybridization with superscalar mechanisms

• Reconfigurable hardware support for ASH

• Formal verification

Page 51: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

51

How far can you go?

Grand Vision:Certified Circuit Generation

• Translation validation: input ´ output

• Preserve input properties– e.g., C programs cannot deadlock– e.g., type-safe programs cannot crash

• Debug, test, verify only at source-level

HLL IR IRopt Verilog gates layout

formally validated

Page 52: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

52

Conclusions

Feature Advantages

No interpretation Energy efficiency, speed

Spatial layout Short wires, no contention

Asynchronous Low power, scalable

Distributed No global signals

Automatic compilation Design productivity, no ISA

Spatial computation strengths

Page 53: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

53

Backup Slides• Reconfigurable hardware

• Critical paths• Control logic• ASH vs ...• ASH weaknesses• Exceptions• Normalized area• Why C?• Splitting memory• More performance• Recursive calls

Page 54: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

54

Reconfigurable Hardware

Universal gates

and/or

storage elements

Interconnectionnetwork

Programmable switches

Page 55: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

55

Switch controlled by a 1-bit RAM cell

0001

Universal gate = RAM

a0a1a0

a1

dataa1 & a2

0data in

control

Main RH Ingredient: RAM Cell

back

Page 56: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

56

Critical Paths

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Page 57: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

57

Lenient Operations

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Solves the problem of unbalanced paths

back to talkback

Page 58: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

=

rdyin

ackout

rdyoutackin

datain dataout

Re

g

C

Asynchronous Control

back back to talk

Page 59: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

59

HLL to HW

High-level Synthesis

BehavioralHDL

SynchronousHardware

ReconfigurableComputing

C [subsets]

Hardwareconfiguration

(spatial computation)

Asynchronouscircuits

ConcurrentLanguage

AsynchronousHardware

Prior work

This research

Page 60: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

60

CASH vs High-Level Synthesis

• CASH: the only existing tool to translate complete ANSI C to hardware

• CASH generates asynchronous circuits

• CASH does not treat C as an HDL– no annotations required– no reactivity model– does not handle non-C, e.g., concurrency

back

Page 61: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

61

ASH Weaknesses

• Low efficiency for low-ILP code

• Does not adapt at runtime

• Monolithic memory

• Resource waste

• Not flexible

• No support for exceptions

Page 62: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

62

ASH Weaknesses (2)

• Both branch and join not free• Static dataflow (no re-issue of same instr)• Memory is “far”• Fully static

– No branch prediction– No dynamic unrolling– No register renaming

• Calls/returns not lenient

back

Page 63: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

63

Predicted not takenEffectively a noop for CPU!

Predicted taken.

Branch Prediction

for (i=0; i < N; i++) {

...

if (exception) break;

}

i

+

<

1

&

!

exception

result available before inputs

ASH crit path

CPU crit path

back

Page 64: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

64

Exceptions• Strictly speaking, C has no exceptions

• In practice hard to accommodate exceptions in hardware implementations

• An advantage of software flexibility: PC is single point of execution control

High-ILPcomputation

Low ILP computation+ OS + VM + exceptions CPU ASH

Memory

back

$$$

Page 65: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

65

Why C

• Huge installed base

• Embedded specifications written in C

• Small and simple language– Can leverage existing tools– Simpler compiler

• Techniques generally applicable

• Not a toy language

back

Page 66: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

66

Performance

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

adpcm

_d

adpcm

_e

g721_d

g721_e

gsm_d

gsm_e

jpeg_

d

jpeg_

e

mpeg2_d

mpeg2_e

pegwit_

d

pegwit_

eavg

Meg

aop

erat

ion

s p

er s

eco

nd

MOPSallMOPSspecMOPS

Page 67: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

67

Parallelism Profile

0

5

10

15

20

25

adpc

m_d

adpc

m_e

epic

_d

epic

_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mes

a

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

rast

a

CPU

ASH

4

Page 68: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

68

Normalized Area

back back to talk

0

20

40

60

80

100

120

adpc

m_d

adpc

m_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg_

d

jpeg_

e

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

avg

0

0.5

1

1.5

2

2.5Lines/sq mmsq mm/kbyte

Page 69: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

69

Memory Partitioning• MIT RAW project: Babb FCCM ‘99,

Barua HiPC ‘00,Lee ASPLOS ‘00

• Stanford SpC: Semeria DAC ‘01, TVLSI ‘02

• Berkeley CCured: Necula POPL ‘02

• Illinois FlexRAM: Fraguella PPoPP ‘03

• Hand-annotations #pragma

back back to talk

Page 70: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

70

Memory Complexity

back

LSQ

RAMaddr

data

back to talk

Page 71: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

71

Recursion

recursive call

save live values

restore live valuesstack

back

Page 72: Spatial Computation Computing without General-Purpose Processors Mihai Budiu mihaib@cs.cmu.edu Carnegie Mellon University July 8, 2004

72

Me?