stream-based arrays: converging design flows for both,

28
VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001, Montpellier, France Reconfigurabl e and Hardwired ....

Upload: tricia

Post on 19-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

December 2- 4, 2001, Montpellier, France. Stream-based Arrays: Converging Design Flows for both,. Reconfigurable. Reiner Hartenstein University of Kaiserslautern. and Hardwired. >> Stream-based Computing. Stream-based Computing Stream-based Compilation Techniques Use in Co-Design - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stream-based Arrays:  Converging Design Flows  for both,

VLSI-SoC 2001 IFIP - LIRMM

Stream-based Arrays: Converging Design Flows for both,

Reiner Hartenstein

University ofKaiserslautern

December 2- 4, 2001, Montpellier, France

Reconfigurable

and Hardwired ....

Page 2: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de2

University of Kaiserslautern

Xputer Lab>> Stream-based

Computing

• Stream-based Computing

• Stream-based Compilation Techniques

• Use in Co-Design

• Now it’s up to You !http://www.uni-kl.de

Page 3: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de3

University of Kaiserslautern

Xputer Lab

XPU family (IP cores):PACT Corp., Munich

commercial rDPAs: rDPA (coarse grain) becoming

important

XPU128**) bought

**

**

flexible array: MorphICs

CALISTO: Silicon Spice

CS2000 family:Chameleon Systems

MECA family: Malleable

FIPSOC: SIDSA

ACM: Quicksilver Tech

CHESS array: Elixent

MorphoSys: Morpho Tech

http

://pa

ctco

rp.c

om

Page 4: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de4

University of Kaiserslautern

Xputer Lab

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

SNN filter Example: KressArray Family

not usedbackbus connect

KressArrayXplorer:rout thru only

http://kressarray.de You may use iton your Netscape

Page 5: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de5

University of Kaiserslautern

Xputer Lab Rapidly toward the Break-through

• replaceConcurrent Processes by more efficient parallelism: stream-based DPAs1

**) reconfigurable

2 ) KressArray** [1995]

and others [later]

terms:DPU: datpath unitDPA: data path arrayrDPU: reconfigurable DPUrDPA: reconfigurable DPA

Kress: a generalization of systolic array synthesis:

stream-based rDPAs2

____

*) hardwired

1 ) systolic array*

[1980]

[Broderson]

Bee Project

chip-on-a-day* [2000]

Generalization ofthe Systolic Array

super systolic synthesis

Page 6: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de6

University of Kaiserslautern

Xputer Lab compare Concurrent Computing

DPUinstructionsequencer

DPUinstructionsequencer

DPUinstructionsequencer

DPUinstructionsequencer

....

Bus(es) or switch box

CPUextremely inefficient

massive bottleneck phenomena at run time •control flow overhead•instruction fetch / interpretation overhead •address computation overhead - may be massive

Page 7: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de7

University of Kaiserslautern

Xputer Lab... with Stream-based Computing:

(r)DPA

for both,• reconfigurable, and• hardwired [Brodersen]

DPU DPUDPU

DPU DPUDPU

DPU DPUDPU

•transport-triggered execution

driven by data stream fr. / to memoryor, fr. / to peripheral interface

•no instruction sequencer inside !

avoids run time overhead and bottleneck

phenomena

rDPA: drastically reduced reconfigurability overhead

•„instruction fetch“: at compile time

Page 8: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de

University of Kaiserslautern

Xputer Lab

8

Soft rDPA ?

Memorysoft CPU

miscellanous

soft

soft

DPUDPU

arra

y

arra

ysoft

soft

DPUDPU

arra

y

arra

y

HLL Compiler

•50 mio system gates soon

•even large rDPAs as soft IPs become feasible

•by >2005: don’t care about area

efficiency ?

Page 9: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de9

University of Kaiserslautern

Xputer Lab>> Stream-based Compilation

Techniques

• Stream-based Computing

• Stream-based Compilation

Techniques

• Use in Co-Design

• Now it’s up to You !http://www.uni-kl.de

Page 10: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de10

University of Kaiserslautern

Xputer Lab

norouting!

equations

linearprojection

or algebraicmapping

DPU architecturey

+*

x

a

placement

a12

a11 a21

a32

a31

a23 a33

a22

a13

Systolic Stream-based Computing System

linear pipelinesand uniformarrays only The Mathematician’s

Synthesis Method

Systolic Array [H. T. Kung, 1980]: a DPA (Data Path Array)

computingin space

placement

computingin time

systolicarrays etc.

and other transformationsmigration by re-timing

this dichotomy iscompletely ignoredby our CS curricula

y10

y20

y30

---

y1

y2

y3

---

x1

x2

x3

-

- -

datastreams

Page 11: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de11

University of Kaiserslautern

Xputer Lab

2

General Stream-based Computing Systemheterogenous DPA or rDPA

simulated

annealing

free form

pipe network

Mapper

expression treeDPU architectures

y

+*

x

a

simultaneousplacement& routing

3

+

++

+

***sh

*sh

sh sh

xf

xf

-

-

1

Schedulerdatastreams

4

2

Page 12: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de12

University of Kaiserslautern

Xputer Lab

•an example by Nageldinger’s KressArray Xplorer

Memory Communication Architecture …•hot research topic in embedded systems

•storage context transformations [Cathoor, Herz, Kougia, Soudris]

•Synthesizable Memory Communication Architecture

• startups provide memory IPs or generators

application not usedLegend:

sequencersmemory ports

Optimized ParallelMemory Controller

GAG generic sequencer methodology available

Herz

Page 13: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de13

University of Kaiserslautern

Xputer Lab>> Use in Co-Design

• Stream-based Computing

• Stream-based Compilation

Techniques

• Use in Co-Design

• Now it’s up to You !http://www.uni-kl.de

Page 14: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de14

University of Kaiserslautern

Xputer Lab

datacounter(s)

programcou n ter:

state register

CompilerMemory

Datapath

hardwired

Sequencer

Computer Computer tightly coupledby compact

instruction code

“von Neumann”

“von Neumann”does not supportsoft data pathsdoes not supportsoft data paths

Datapath

reconfigurable

Xputer Xputer

SchedulerCompiler

Memory

(multiple)sequencer

DatapathArray

University of Kaiserslautern

Xputer Lab

loosely coupledby decision data bits only

Xputer:Xputer:The Soft Machine Paradigm

The Soft Machine Paradigm reconfigurablereconfigurable

Computer:the wrong Machine Paradigm“von Neumann”

also for hardwiredalso for hardwired[Broderson]

enabling technologypublished 10 years ago

now a hot topic area

full day courselast week at Tampere, Finland

Page 15: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de15

University of Kaiserslautern

Xputer Lab

partitioning compiler

high level programming language source

Co-Compilation

Analyzer/ Profiler

supportingdifferentplatforms

Resource Parameters

Xputer

“Soft” Machine Paradigm

Configware running on

inte

rfac

e

ReconfigurableAccelerators

X-Ccompiler

KressArray

DPSS

GNU Ccompiler

X-C

Partitioner

Hardware / Software Co-Design turnsto Configware / Software Co-DesignJürgen Becker’s Co-DE-X Co-Compiler[ASP-DAC’95]

Computer

Machine Paradigm

Software running on

Processor

Page 16: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de16

University of Kaiserslautern

Xputer LabLoop Transformation

Examples

loop 1-8bodyendloop

loop 9-16bodyendloop

fork

joinstrip mining

loop 1-4triggerendloop

loop 1-2triggerendloop

loop 1-16bodyendloop

sequential processes:

loop 1-8triggerendloop

reconf.array:host:

resource parameter drivenCo-Compilation

loop 1-8bodybodyendloop

loop unrolling

Page 17: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de17

University of Kaiserslautern

Xputer Lab>> Now it’s up to You !

• Stream-based Computing

• Stream-based Compilation Techniques

• Use in Co-Design

• Now it’s up to You !

http://www.uni-kl.de

Page 18: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de18

University of Kaiserslautern

Xputer LabHowever, current CS Education ….

Hardware invisible:under the surface

… is based on the Submarine Model

Brain usage:procedural-only

Software Faculty Colleagues shy away from the Paradigm Shift:their Brain hurts? - can’t be: this Half has been amputated

Algorithm

Assembly Language

procedural high level Programming

Language

Hardware

Software

This model disables ...

Page 19: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de19

University of Kaiserslautern

Xputer Lab

Hardware,Configware

... this model disablesHardware and Software as Alternatives

Algorithm

Software

partitioning

Software onlySoftware & Hardw/Configw

procedural structural

Brain Usage:both Hemispheres

Hardw/Configw only

Page 20: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de20

University of Kaiserslautern

Xputer LabThe Dominance of the Submarine

Model ...

Hardware

... indicates, that our CS education system produces zillions of mentally disabled

Persons

(procedural) structurallydisabled

… completely disabled to cope with solutions other than software only

It‘s time to attack the software faculty dictatorship.Get

involved!

Page 21: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de21

University of Kaiserslautern

Xputer Lab>>> thank you

thank you for listeningIt’s up to You !

Page 22: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de22

University of Kaiserslautern

Xputer Lab>>> END

END

Page 23: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de23

University of Kaiserslautern

Xputer LabThe Impact of Reconfigurable

Logic• Reconfigurable platforms bring a new dimension to digital

system development and have a strong impact on SoC design.

• A rapidly growing large user base of HDL-savvy designers with FPGA experience.

• Flexibility promises spin-around times downto minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades

• A New Business Model (in-field debugging and upgrading ... )

• A Fundamental Paradigm Shift in Silicon Application

Revenue/ month

Time / months

Update 1

Product

Update 2

1 10 20

ASIC Product

reconfigurable Product with download

30

[T. Kean]

Page 24: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de24

University of Kaiserslautern

Xputer LabThe History of

Paradigm Shifts

“Mainstream Silicon Applicationis switching every 10 Years”

TTL µproc.,memory

custom

standard

1957

1967

1977

1987

1997

2007

Makimoto’s Wave

ASICs,accel’s

LSI,MSI

??

“The Programmable System-on-a-Chipis the next wave“

reconfigurablePublished

in 1989

Page 25: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de25

University of Kaiserslautern

Xputer LabHow’s next Wave ?

2007

custom

standard

1957

1967

1977

1987

1997

procedural programming

algorithm: variable

resources: fixed

Tredennick’sParadigm Shifts

hardwired

algorithm: fixed

resources: fixed

2007FPGAs

structural programming

algorithm: variable

resources: variable

no further wave !

Coarse grain

RAs

Hartenstein’s Curve

Page 26: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de26

University of Kaiserslautern

Xputer LabThe Impact of

Makimoto’s Paradigm Shifts

TTL µproc.,memory

custom

standard

ASICs,accel’s

LSI,MSI

reconfigurable

1957

1967

1977

1987

1997

2007

Proceduralpersonalization via RAM-based

Machine Paradigm

structuralpersonalization:

RAM-basedbefore run time

Dr. Makimoto: FPL 2000 keynote

Software Industry’sSecret of Success

Configware Success Storyby new Machine ParadigmConfigware Success Storyby new Machine Paradigm

Page 27: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de27

University of Kaiserslautern

Xputer LabThe History of

Paradigm Shifts

“Mainstream Silicon Applicationis switching every 10 Years”

custom

standard

1957

1967

1977

1987

1997

2007

Makimoto’s Wave

TTL µproc.,memory FPGAs

ASICs,accel’s

LSI,MSI

coarsegrain

Page 28: Stream-based Arrays:  Converging Design Flows  for both,

© 2001, [email protected] http://www.fpl.uni-kl.de28

University of Kaiserslautern

Xputer Lab

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !

+

rout-through and function

rout-throug

h only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

4

2 rDPU

Select mode, number, width of NNports

Wired by Abutment