fpga architecture support for heterogeneous, relocatable...

30
1 24th International Conferenceon Field Programmable Logic and Applications September 3 rd , 2014 September 3 rd , 2014 C. Huriaux, O. Sentieys and R. Tessier -1 FPGA Architecture Support for Heterogeneous, Relocatable Partial Bitstreams Christophe HURIAUX v , Olivier SENTIEYS v, Russell TESSIER University of Rennes 1, France v Inria, France University of Massachusetts, USA

Upload: others

Post on 13-Jul-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

1

24th International Conferenceon Field Programmable Logic and Applications September 3rd, 2014

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 1

FPGA Architecture Support for Heterogeneous, Relocatable Partial

Bitstreams

Christophe HURIAUXv, Olivier SENTIEYSv★, Russell TESSIER✜

University of Rennes 1, France vInria, France ★

University of Massachusetts, USA ✜

Page 2: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

2

Outline§ Introduction

§ Overview of the FlexTiles project§ Architecture Overview§ Advantages of 3-D Stacking

§ Principles§ Task Migration in an FPGA§ Task Migration in FlexTiles§ Heterogeneous case

§ Approach§ Coping with Heterogeneity§ Design Constraints

§ Results§ Implementation in VPR

§ Conclusion

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 2

Page 3: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

3

FP7 FlexTiles Project

§ FlexTiles: Self adaptive heterogeneous manycore based on Flexible Tiles

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 3

§ Provide a heterogeneous many-core architecture offering § Large flexibility§ High-performance, energy efficiency§ Raised programming efficiency§ Self-adaptation through virtualization

Page 4: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

4

Architecture Overview

§ 3D-Stacked Heterogeneous manycore§ General Purpose Processors (GPP)

§ for flexibility and programming homogeneity§ Network On Chip§ Dedicated hardware accelerators mapped at

run-time on a reconfigurable layer

§ Reconfigurable layer with seamless task migration capabilities

§ Virtualization layer to provide an abstraction of the manycore and self adaptive services

§ Tool-chain for parallelization and compilation

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 4

Page 5: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

5

Architecture Overview

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 5- 5

3D interface to the NoC

DSP blocks

Memory blocks

Page 6: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

6

Task migration

§ Classical problem in dynamic reconfiguration[1]§ Enhance resource usage

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 6

4x4?

[1] K. Compton, Z. Li, J. Cooley, S. Knol, and S. Hauck, “Configuration relocation and defragmentation for run-time reconfigurable computing,” IEEE Transactions on VLSI Systems, vol. 10, no. 3, pp. 209 –220, 2002.

Page 7: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

7

3D Stacking

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 7- 7

Core Core CoreCore Core Core

Core Core Core

reconfigurable layer

multicore layer

§ 3D-Stacked Reconfigurable Accelerators§ Improved resource usage§ Improved bandwidth/latency§ Improved performance and energy efficiency

Core Core CoreCore Core Core

Core Core Core

Page 8: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

8

Task Migration in an FPGA

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 8

§ Predefined reconfigurable regions

§ Bit-stream depends on task location

I/O I/O I/O I/O I/O I/O I/O

I/O I/O I/O I/O I/O I/O I/O

I/OI/O

I/OI/O

I/OI/O

I/OI/O

I/OI/O

I/OI/O

I/OI/O

I/OI/O

I/O

I/O

HW Accelerator #1

BS #1

HW Accelerator #1

BS #2

Page 9: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

9

Task Migration in FlexTiles

§ A task is synthesized, placed & routed into a Virtual Bit-Stream (VBS)§ Independent from task physical location in the fabric§ No predefined configuration domains

§ Resource sharing/distribution easiness, simplified task migration

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 9

1 2 3 11 321 2

3 212

212

3

1 321

§ Reconfiguration controller generates final BS at run-time

Page 10: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

10

Task Migration in FlexTiles

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 10

3D NI3D NI

3D NI3D NI

RAM DSP RAM DSP

RAM DSP RAM DSP

3D NI3D NI

3D NI

3D NI

3D NI

3D NI

3D NI

3D NI

3D NI

3D NI

3D NI

HW Accelerator #2

VBS #2

HW Accelerator #1

VBS #1

Page 11: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

11

Heterogeneity

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 11

§ Homogeneous case§ No constraint on task placement§ Regular routing architecture

§ Cope with heterogeneity§ RAM, DSP, 3D I/Os§ Migration is limited

§ vertically to the same column§ to the next column containing same

complex blocks

TaskConfigured LELogic Element (LE)

Page 12: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

12

Proposed architecture

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 12

§ Heterogeneous blocks routing is abstracted from logic routing§ Long lines allow a trade-off between placement

flexibility and routing complexity§ A two-level routing is performed at runtime:

§ Logic routing (as in the homogeneous case)§ Heterogeneous block routing through long lines

Page 13: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

13

Design Constraints

§ I/Os are made through 3D Network Interfaces, spread over the reconfigurable fabric

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 13

Rec

onfig

urat

ion

RAM

Reconfiguration CTRL

MEM

DSP 3D NI

AI

3D NI

AI

DSPDSPDSPDSPDSPDSPDSPDSPDSPDSP

MEMMEMMEMMEMMEMMEMMEM

3D NI

AI3D NI

AI

3D NI

AI

3D NI

AI

3D NI

AI

3D NI

AI

DSPDSPDSPDSPDSP

MEMMEMMEM

3D NI

MEM

MEM

DSPDSPDSPDSPDSPDSPDSPDSPDSPDSPDSP

MEMMEMMEMMEMMEMMEMMEM

DSPDSPDSPDSPDSP

MEMMEMMEMMEM

AI

Page 14: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

14

Implementation in VPR

§ Versatile Place and Route (VPR), open source CAD tool for placement and routing

§ Part of the Verilog To Routing (VTR) framework

§ Source code modified to implement ourtechniques and deal with our constraints§ Horizontal long-lines spread over partitions§ Separate homogeneous and heterogeneous routing

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 14

VPR and VTR: https://code.google.com/p/vtr-verilog-to-routing/

Page 15: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

15

Implementation in VPR

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 15

X

X

Y X

X

Fc=0.5Fc=1

VPR Original Routing Model

§ Logic grid§ Block placement

§ X: simple block§ Y: 2 blocks tall

§ Mesh routing lines§ Switch boxes§ Interconnect

Page 16: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

16

Implementation in VPR

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 16

YX

X

X

X

Enhanced Routing Model

§ Logic grid§ Block placement§ Block typing

§ X: homogeneous§ Y: heterogeneous

§ Mesh routing lines§ Long lines§ Switch boxes§ Interconnect

§ Homogeneous§ Heterogeneous

Page 17: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

17

Results

§ Architecture based on a simplified Stratix IV with:§ Dual-port 144k memories§ Fracturable 36x36 multipliers

§ Evaluation on two criteria§ Delay of the critical path§ Minimum channel width

§ Number of tracks in the homogeneous routing channels

§ Minimum channel width determined by VPR§ Not directly related to silicon area

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 17

Page 18: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

18

Results§ Benchmark set: VTR framework circuits [1]

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 18

[1] Rose, Jonathan, Luu, Jason, Yu, Chi Wai, et al. The VTR project: architecture and CAD for FPGAs from verilog to routing. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. ACM, 2012. p. 77-86.

Circuit # Mem # Mult # LBbgm 0 11 2,174boundtop 1 0 2,977ch_intrinsics 1 0 272diffeq1 0 5 41diffeq2 0 5 43LU8PEEng 45 8 30mkDelayWorker32B 41 0 497mkPktMerge 15 0 17mkSMAdapter4B 5 0 181or1200 2 1 273raygentop 1 7 192stereovision1 0 38 990

Page 19: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

19

Results: Delay

§ Estimation of the worst case delay§ Impossible to predict where connections to long lines

will be done§ Some channels crossing fixed-function blocks are

longer

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 19

Page 20: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

20

Results: Delay

§ Only 2% delay increase (in average)

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 20

0

0,2

0,4

0,6

0,8

1

1,2

0,00

20,00

40,00

60,00

80,00

100,00

120,00

140,00

160,00proposed/classicns

Crit. Path (classic)

Crit. Path. (enhanced)

Crit. Path. (ratio)

Page 21: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

21

Results: Min. Channel Width

§ 1.8X channel width increase on average§ Need for specific routing algorithms to deal with

the heterogeneous interconnection network

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 21

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

0,00

20,00

40,00

60,00

80,00

100,00

120,00

140,00

160,00proposed/classic# tracks

min W (classic)

min W (enhanced)

min W (ratio)

Page 22: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

22

Conclusion

§ FPGA embedded in a 3D architecture§ More flexibility for task placement and/or

relocation§ Low impact on delay but cost on routing

resources§ Need to find a trade-off between flexibility and

area increase of additional connections

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 22

Page 23: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

23

Thank you for your attention

More info on FlexTiles: http://www.flextiles.eu

C. Huriaux, O. Sentieys and R. Tessier September 3rd, 2014 - 23

Page 24: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

24

Thank you for your attention

C. Huriaux, O. Sentieys and R. Tessier September 3rd, 2014 - 24

Page 25: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

25

Virtual Bit-Stream: Example

§ Hiding routing details§ Full BS is 129 bits§ Could be reduced by

giving less details

Jan. 2014CAIRN project-team - 25

CLBIN[1]

CLBIN[2]

CLBIN[3] CLBOUT

CLBIN[0]

4567

12131415

0123

891011

16

17

18

1920

Page 26: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

26

Virtual Bit-Stream: Example

§ Hiding routing details§ List of I/O and

connections§ 20 è 8 § 1 è 9 § 5 è 18

Jan. 2014CAIRN project-team - 26

4567

0123

89101116

17

18

1920

12131415

Page 27: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

27

Results: BS Sizes on MCNC Benchmarks

0"

200"

400"

600"

800"

1000"

1200"

1400"

1600"

tseng" tseng" diffeq" diffeq" apex4" des" ex5p" misex3"

Kilo%bits)

Rou:ng"

Logic"

Jan. 2014CAIRN project-team - 27

Page 28: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

28

Results: VBS Sizes on MCNC Benchmarks

44.4%$49.2%$ 47.2%$

55.2%$49.7%$

29.5%$ 27.4%$ 26.6%$

0.0%$

10.0%$

20.0%$

30.0%$

40.0%$

50.0%$

60.0%$

70.0%$

80.0%$

90.0%$

100.0%$

0$

200$

400$

600$

800$

1000$

1200$

1400$

1600$

tseng$ tseng$ diffeq$ diffeq$ apex4$ des$ ex5p$ misex3$

Kilo%bits)

BS$size$

VBS$size$

Compression$raBo$

Jan. 2014CAIRN project-team - 28

Page 29: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

29

Introduction: Architecture Overview

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 29- 29

3D Access Pointto the NoC

Page 30: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More

30

Introduction: Architecture Overview

September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 30- 30

General Architecture Overview