faculty of sciences and technology university of algarve, faro joão m. p. cardoso april 30, 2001...

37
Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units Portugal

Upload: carson-bettes

Post on 30-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Faculty of Sciences and TechnologyUniversity of Algarve, Faro

João M. P. Cardoso

April 30, 2001

IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA

A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units

A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units

Portugal

Page 2: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

IndexIndex

Introduction

Temporal Partitioning

Problem Definition

New vs Previous Approach

Algorithm Working Through an Example

Experimental Results

Related Work

Conclusions

Future Work

Page 3: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

IntroductionIntroduction

“Virtual Hardware”: Reuse of devices Save silicon area View “unlimited resources” Enabled by the dynamically reconfigurable FPGAs

Two concepts: Context switching among functionalities Allowing a large “function” to be executed

FPGA devices allowing virtualization: off-chip configurations on-chip configurations

Several research efforts…

Page 4: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

IntroductionIntroduction

Answers: Temporal Partitioning Sharing of Functional Units

Goal: combining the two...

dx

+

u

-

u

-

dx

+

u_1

x y

dxx

x_1

dxu

y_1

+

y<< 1 << 1

Size larger than the available reconfigware area?

Page 5: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Temporal PartitioningTemporal Partitioning

uxdxx u

aux1

+

x_1

dx

y_1

+

y<< 1

time

Page 6: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Temporal PartitioningTemporal Partitioning

aux1

dx

-

u

-

dx

+

u_1

y

<< 1

time

Page 7: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Temporal PartitioningTemporal Partitioning

aux1

+

ux

dxx

x_1

dxu

y_1

+

y<< 1

aux1

dx

-

u

-

dx

+

u_1

y

<< 1

time

Page 8: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Temporal PartitioningTemporal Partitioning

Create temporal partitions to be executed by time-sharing the device

Netlist level (structural) Difficulties when dealing with feedbacks Loss of Information Flat structure Intricate for exploiting sharing of functional units

Behavioral level (functional) Loops can be explicitly represented Better design decisions “A must” for compilers for reconfigurable computing

Page 9: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Problem DefinitionProblem Definition

But, if we decrease the needed area by sharing functional units?

Simultaneously Temporal Partitioning and sharing of Functional Units

THE PROBLEM:

Given a dataflow graph (representing a behavioral description), a library of components,...

Map the dataflow graph onto the available resources of the FPGA device: Considering sharing of Functional Units Considering Temporal Partitioning Decreasing the overall execution latency

Page 10: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

New vs Previous ApproachNew vs Previous Approach

Previous

Simultaneously Temporal

Partitioning and High-Level Synthesis

Component Library

ConstraintsDFG, CDFG

Circuit-generation,

Logic Synthesis

Temporal Partitioning

High-Level Synthesis

Component Library

Circuit-generation,

Logic Synthesis

ConstraintsDFG, CDFG

New

Page 11: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Suppose the following dataflow graphSuppose the following dataflow graph Consider:

Area(+) = 1 cell Area(x) = 2 cells Delay(+) = 1 control step (cs) Delay(x) = 2 cs

Total area of the DFG: 8 cells

Available Area: 3 cells

0 1

2

3

4

5

Page 12: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Calculate ASAP and ALAP valuesCalculate ASAP and ALAP values

Node 0 1 2 3 4 5ASAP 0 0 1 0 2 3ALAP 1 1 2 0 2 3

0 1

2

3

4

5

Page 13: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Identify the critical pathIdentify the critical path

Node 0 1 2 3 4 5ASAP 0 0 1 0 2 3ALAP 1 1 2 0 2 3

0 1

2

3

4

5

Page 14: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Create an initial number of TPs: suppose 3Create an initial number of TPs: suppose 3

0 1

2

3

4

5

MAXCS

1

2

3

Area

Page 15: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Map each node of the critical path on each temporal partitionMap each node of the critical path on each temporal partition

0 1

2

3

4

5

MAXCS

2 cs

1

2

3

3

4

5

Area

1 cs

1 cs

Page 16: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

0 1

2

3

4

5

MAXCS

2 cs

1

2

3

3

4

5

Area

1 cs

1 cs

Page 17: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

0

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

Page 18: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

Page 19: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

3

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

0 1

2

3

4

5

Page 20: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

2

Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)

0 1

2

3

4

5

Page 21: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

Try to map nodes in each temporal partition (3)Try to map nodes in each temporal partition (3)

0 1

2

3

4

5

2

Page 22: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Relax: add 1 clock step to MAXCS Relax: add 1 clock step to MAXCS

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

Page 23: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

3

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

Page 24: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)

2

Page 25: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

2

Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)

2

Page 26: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Merge Operation (1) Merge Operation (1)

10

2 cs

1

2

3

3

4

5

2 cs

1 cs

MAXCSArea

0 1

2

3

4

5

2

Page 27: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Merge Operation (1) Merge Operation (1)

10

1,2

3

3

4

5

MAXCSArea

2

0 1

2

3

4

54 cs

1 cs

Page 28: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Merge Operation (2) Merge Operation (2)

10

1,2

3

3

4

5

1 cs

MAXCSArea

2

0 1

2

3

4

54 cs

Page 29: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Merge Operation (2) Merge Operation (2)

10

1,2,3

3

4

5

MAXCSArea

2

0 1

2

3

4

5

4 cs

Page 30: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Experimental ResultsExperimental Results

Near-optimal w/o sharing vs sharingNear-optimal w/o sharing vs sharing

0

2

4

6

8

10

12

14

16

18

#T

Ps

-30%

-20%

-10%

0%

10%

20%

30%

Pe

rf. Im

pro

v.

#p(SA) #p(Our*)#p(Our*) %(#cs-Our*)%(#cs-Our**)

EX1 SEHWA HAL EWF

Page 31: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Experimental ResultsExperimental Results

048

12

16202428

#TP

s

-16%-10%-4%2%8%14%20%26%32%

Per

f. Im

prov

.

#p(SA) #p(Our*) #p(Our*)

%(#cs-Our*) %(#cs-Our**)

Near-optimal w/o sharing vs sharingNear-optimal w/o sharing vs sharing

FIR MAT4x4

72 37

Page 32: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Experimental ResultsExperimental Results

Performance vs No. of Temporal PartitionsPerformance vs No. of Temporal Partitions

Mult4x4, RMAX=10 (no sharing of adders)

05

1015202530

1 3 5 7 9 11 13 15 17 19 21 23 25Initial Number of TPs

Final

#TPs

646668

7072

Exec

. (#c

s)

TPsExec.

Page 33: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Experimental ResultsExperimental Results

Is the algorithm good for scheduling?Is the algorithm good for scheduling?

0

5

10

15

20

25

30

35

#cs

known scheduling results

Our

EWF SEHWA

Comparison to some optimum results

Page 34: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Related WorkRelated Work

List-Scheduling considering dynamic reconfiguration [Vasilko et al., FPL’96]

ASAP [GajjalaPurna et al., IEEE Trans. on Comp., 1999]

Minimize latency taking onto account communication costs [Cardoso et al. VLSI’99]: Enhanced Static-List Scheduling Iterative approach (Simulated Annealing)

ILP formulation [SPARCs, DATE’98; RAW’98]

Enhanced Force-Directed List Scheduling [Pandey et al., SPIE’99]

And others [see the Related Work section]

Page 35: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

ConclusionsConclusions

Novel algorithm simultaneously doing temporal partitioning and sharing of functional units Low complexity Heuristic approach Based on gradually enlarging of time slots

Permits to exploit the duality between the number of temporal partitions and resource sharing

Close-to-optimum results with some examples

Results proved that the algorithm is not weak when performing scheduling

Page 36: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Future WorkFuture Work

Enhancements to the algorithm: consider functional units with pipelining consider pipelining between execution and

reconfiguration

Study the possibility to take into account communication and reconfiguration costs

Test results with a reconfigurable computing system (comercial board)

Page 37: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing

Contact AuthorContact Author

João M. P. Cardoso

[email protected]

http://w3.ualg.pt/~jmcardo

THANK YOU!