overlays: a soluon paradigm for fpga high-level design?

31
Overlays: a solu-on paradigm for FPGA high-level design? Tarek S. Abdelrahman The Edward S. Rogers Department of Electrical and Computer Engineering University of Toronto [email protected]

Upload: others

Post on 21-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overlays: a soluon paradigm for FPGA high-level design?

Overlays:asolu-onparadigmforFPGAhigh-leveldesign?

TarekS.Abdelrahman

TheEdwardS.RogersDepartmentofElectricalandComputerEngineering

UniversityofToronto

[email protected]

Page 2: Overlays: a soluon paradigm for FPGA high-level design?

ReconfigurableSystemsontheRise•  FPGAsareincreasinglyintegratedincompuCngsystems

–  Massiveparallelismcanleadtohighperformance–  Lowerpower–  Customizability

•  NewergeneraConofhigh-performancesystemsintegrateFPGAswithmulCcores,targeCngdatacenters–  ExamplesystemsfromIntel,IBMandXilinx–  UsedmainlybysoOwaredevelopers

16-07-11 2

Page 3: Overlays: a soluon paradigm for FPGA high-level design?

ReconfigurableSystemsontheRise•  FPGAsareincreasinglyintegratedincompuCngsystems

–  Massiveparallelismcanleadtohighperformance–  Lowerpower–  Customizability

•  NewergeneraConofhigh-performancesystemsintegrateFPGAswithmulCcores,targeCngdatacenters–  ExamplesystemsfromIntel,IBMandXilinx–  UsedmainlybysoOwaredevelopers

16-07-11 3

Page 4: Overlays: a soluon paradigm for FPGA high-level design?

FPGAProgrammabilityBurdens•  FPGAsareprogrammedusingahardwaredesignabstracCon,

whichisforeigntothebulkofsoOwaredevelopers–  HDL,Timing,fiYng,seedsweeps,etc.

•  FPGAdevelopmenttoolsleadtoextremelylongdevelopmentcyclescomparedtotheirsoOwarecounterparts–  Alargecircuitcantakedaystocompile(synthesis,place,route,Cme,

etc.)andmayneedseveralcompiles

•  ThereisapressingneedtoalleviatetheseburdensandmakeFPGAdesignaccessibletosoOwaredevelopers

16-07-11 4

Page 5: Overlays: a soluon paradigm for FPGA high-level design?

TacklingtheBurden•  High-LevelSynthesis(HLS)

–  GeneratedhardwareincreasinglycompeCCvewithHDLdesign

•  High-levelprogrammingmodels–  DataflowmodelfromMaxeler

•  Nonetheless:–  Developerremainsexposedtovariousaspectsofhardwaredesign–  UseofFPGAdesigntoolsissCllrequired!⇒longdevelopmentcycles

16-07-11 5

Page 6: Overlays: a soluon paradigm for FPGA high-level design?

Overlays•  Pre-compiledFPGAcircuitsthatareinthemselves

configurable/programmable,i.e.,run-Cmeconfigurable–  Examples:soOprocessors,GPU-on-FPGA,mesh-of-FUs,etc.

16-07-11 6

SoFProcessor

Source:Andrycetal:FlexGrip:ASoFGPGPUforFPGAs,FPT13

PE PE PE

PE PE PE

PE PE PE

Page 7: Overlays: a soluon paradigm for FPGA high-level design?

FPGAvs.OverlayDesignFlows

16-07-11 7

Pre-compiledoverlay

FPGAFPGA

FPGADesignTools

ConfiguraConStreamFPGA

bitstream

Applica-on(HDL)

Applica-on-to-OverlayTools

Applica-on(C,CUDA,DFG,etc.)

seconds

hours/days

µseconds

harder simpler

Page 8: Overlays: a soluon paradigm for FPGA high-level design?

Mesh-of-FUsOverlays[FPL2013]

16-07-11 8

ADD ADD

EXP SHF

ADD SUB SUB

MUL

DIV

FuncConUnit

RouCnglogic

4-NNconnectedarrayofcells

DataFlowGraph

O1

O2

I1 I2 I3 I4 I5 I6

ADD SUB SUB

MUL

DIV

C

E

D

A B

Page 9: Overlays: a soluon paradigm for FPGA high-level design?

MappingDFGstoOverlay–Place

16-07-11 9

ADD ADD

EXP SHF

ADD SUB SUB

MUL

DIV

I1

I2

I3 I4 I5

I6

O1 O2

A B C

D

E

DataFlowGraph

O1

O2

I1 I2 I3 I4 I5 I6

ADD SUB SUB

MUL

DIV

C

E

D

A B

Page 10: Overlays: a soluon paradigm for FPGA high-level design?

16-07-11 10

ADD SUB SUB

ADD MUL ADD

DIV EXP SHF

O1

O2

I1 I2 I3 I4 I5 I6

ADD SUB SUB

MUL

DIV

C

E

D

A B

I1

I2

I3 I4 I5

I6

O1 O2

A B C

D

E

pipelineregister/FIFO

DataFlowGraph

MappingDFGstoOverlay–Route

Page 11: Overlays: a soluon paradigm for FPGA high-level design?

O1

O2

I1 I2 I3 I4 I5 I6

ADD SUB SUB

MUL

DIV

C

E

D

A B

O1

O2

I1 I2 I3 I4 I5 I6

ADD SUB SUB

MUL

DIV

C

E

D

A B

PipelinedExecu-on

16-07-11 11

ADD SUB SUB

ADD MUL ADD

DIV EXP SHF

O1

O2

I1 I2 I3 I4 I5 I6

ADD SUB SUB

MUL

DIV

C

E

D

A B

I1

I2

I3 I4 I5

I6

O1 O2

A B C

D

E

pipelineregister/FIFO

DataFlowGraph

Page 12: Overlays: a soluon paradigm for FPGA high-level design?

Mesh-of-FUsTools•  ApplicaCon-to-overlaytoolchainthat:

–  ExtractsDFGofbodiesofparallelloopsinCcode

–  PlacesandroutestheDFGnodesontotheoverlay•  ConfigurestheswitchestoestablishDFGconnecCvity•  GeneratestheconfiguraConstreamoftheoverlay

16-07-11 12

Page 13: Overlays: a soluon paradigm for FPGA high-level design?

HighPerformancewithnoHardwareDesign

16-07-11 13

DFG Size(nodes) GFLOPS CompileTime(sec)

n-Body 125 18.72 0.44BlackSholes 131 21.22 1.33MatMul 96 19.66 1.05MatMulAdd 114 22.46 3.80

•  Examplemesh-of-FUsoverlayonaStraCxIV[FPL2013]–  SingleprecisionfloaCngpointoperaCons–  288cellsimplementedasan18x16mesh–  fMAXof312MHzand32.4GFLOPSpeak(integerat415MHz)

•  Othersalsoreporthighperformanceresults

GFLOPS CompileTime(sec)

21.52 272422.10 250825.21 204528.79 919

HDLOverlay

Page 14: Overlays: a soluon paradigm for FPGA high-level design?

SoFware-FriendlyTarget•  OverlaysraisethelevelofabstracConofusingFPGAstoone

thatismorefamiliartosoOwaredesigners–  CprogrammingforasoOprocessor–  CUDA/OpenCLforGPUoverlays–  Dataflowgraphsformesh-of-FUs

•  ThisopensupopportuniCesfor“standard”soOwaretoolstotargetFPGAs

16-07-11 14

Page 15: Overlays: a soluon paradigm for FPGA high-level design?

JITCompila-ontoHardware

•  Profilecode

16-07-11 15

:ADD R9,R7,R10BEQZ end

L1: ADD R1,R3,R7MULT R11,R12,R13ADD R8,R1,R11SUB R9,R8,#8SLT R8,R9,R7BNZ R8,L1ADD R7,R6,R1:

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FPGAOverlay

CPU

Page 16: Overlays: a soluon paradigm for FPGA high-level design?

JITCompila-ontoHardware

•  IdenCfyhotsegmentsofcode

16-07-11 16

:ADD R9,R7,R10BEQZ end

L1: ADD R1,R3,R7MULT R11,R12,R13ADD R8,R1,R11SUB R9,R8,#8SLT R8,R9,R7BNZ R8,L1ADD R7,R6,R1:

FPGAOverlay

CPU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

Page 17: Overlays: a soluon paradigm for FPGA high-level design?

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

JITCompila-ontoHardware

•  ExtractDFGandconfiguretheoverlay

16-07-11 17

:ADD R9,R7,R10BEQZ end

L1: ADD R1,R3,R7MULT R11,R12,R13ADD R8,R1,R11SUB R9,R8,#8SLT R8,R9,R7BNZ R8,L1ADD R7,R6,R1:

ADD

MULT ADD

SLT SUB

FPGAOverlay

CPU

ADD

MULT ADD

SLT SUB

Page 18: Overlays: a soluon paradigm for FPGA high-level design?

JITCompila-ontoHardware

•  Re-writethecode

16-07-11 18

:ADD R9,R7,R10BEQZ end

L1: ADD R7,R6,R1:

FPGAOverlay

CPU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

ADD

MULT ADD

SLT SUB

ADD

MULT ADD

SLT SUB

Page 19: Overlays: a soluon paradigm for FPGA high-level design?

JITCompila-ontoHardware

•  TransferexecuContotheoverlay

16-07-11 19

:ADD R9,R7,R10BEQZ end

L1: ADDR7,R6,R1:

FPGAOverlay

CPU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

FU FU FU

ADD

MULT ADD

SLT SUB

ADD

MULT ADD

SLT SUB

User-TransparentDynamicProgramAccelera-on

Page 20: Overlays: a soluon paradigm for FPGA high-level design?

APrototypeJITCompiler•  Target:IntelQuickAssistPlamorm•  Thecompilerprototype:

–  BuiltaroundLLVM,targetsinnermostloopsofscienCficcode–  MiCgatesmuchoftherun-CmeoverheadtocompileCme

•  Overlaycurrentlybeingintegratedintothetargetplamorm16-07-11 20

CPU CPU

SystemMemory

QPICoherentInterconnect

StraCxFPGA

QPIIPAFU

XeonMulCcoreProcessor

FigureaOerIntelliterature

Page 21: Overlays: a soluon paradigm for FPGA high-level design?

Accelera-onPoten-al

16-07-11 21

0

1

2

3

4

5

6

7

Speedu

p

Aplplica-on

FPGAsimulaConresultsbasedonmeasuredsystemparameters

Page 22: Overlays: a soluon paradigm for FPGA high-level design?

Customizability•  OneofthekeyadvantagesofFPGAsisthattheycanbe

customizedforapplicaCons

•  Overlayscanalsobe“user”customizable–  WithminimalusageofFPGAdesigntools

•  Inthecontextofourmesh-of-FUs,wecanvarythechoiceoftheFUateachlocaConofthemesh,i.e.,thefuncConallayout,totheoverlaymoreefficientforanapplicaCon

16-07-11 22

Page 23: Overlays: a soluon paradigm for FPGA high-level design?

ALibrary-BasedApproach

16-07-11 23

A M D D

A M S S

A M A M

A M A M

A M D D

A M S S

A M A M

A M A M

DesiredOverlay LibraryofPre-PlacedandPre-routedOverlays SCtchedOverlay

M M

A AD D

S SA M

A M

•  Bopom-Upflowallows(restricted)relocaConofpre-placedandpre-routedgroupsofcells[FPL2014] sCtch

•  Example12x15overlay:35minutesvs.15hours

Page 24: Overlays: a soluon paradigm for FPGA high-level design?

D A S S

A

A

A

M

M

M

M

ProgramAnalysisforCustomiza-on

16-07-11 24

A

M A M

SS

M

D

A

D

MAProgramAnalysis

A M D D

A M S S

A M A M

A M A M

CandidateOverlaysProgramDFGWork-to-be-done

Page 25: Overlays: a soluon paradigm for FPGA high-level design?

SystemIntegra-on

•  MustbeabletovirtualizetheFPGA–  Takesnapshots–  Migrate–  Shareandmanageasaresource

16-07-11 25

CPUs GPUs FPGAs

VM VM

CPUs GPUs FPGAs

VM VM

CPUs GPUs FPGAs

VM VM

Spark Hadoop GraphLab TensorFlow

ApplicaCon ApplicaCon ApplicaCon

Page 26: Overlays: a soluon paradigm for FPGA high-level design?

OverlaysFacilitateVirtualiza-on•  FPGAvirtualizaCononlynowbeingexplored

–  Requiresspecializedhardware–  Averylarge“state”

•  Overlaysnaturallyhaveamuchsmallerstate,facilitaCngsnapshotsandcontextswitching

–  Wewouldliketoexplorethissupportinourmesh-of-FUsoverlay

16-07-11 26

Page 27: Overlays: a soluon paradigm for FPGA high-level design?

ChallengestoOverlays

•  Resourceoverhead–  Thatis,theFPGAresourcesusedbytheoverlaycomparedtoa

dedicatedcircuit(HDL)thatimplementsthesameapplicaCon–  ~4XforourFPoverlayandcanbehigher–  DifficulttoquanCfydesigneffort

–  FPGAsareareincreasinginsize–  HardfloaCngpointunits–  Hardeningtheoverlayoncedesignisover?

16-07-11 27

Page 28: Overlays: a soluon paradigm for FPGA high-level design?

ChallengestoOverlays–Cont’d•  OverlayarchitecturesneedmoreexploraCon:which

architectureforagivenapplicaCondomain–  Howtoensurescalability?–  TakingintoaccounttheunderlyingFPGAdeviceconstraints–  Howtoimplementwell(e.g.,data-drivenexecuCon,FIFOs,etc.)?–  FixedfuncConvs.mulC-funcConFUs?–  Howtoreducingresourceoverhead?–  TimemulCplexed?–  MulCpledevices?

16-07-11 28

Page 29: Overlays: a soluon paradigm for FPGA high-level design?

ChallengestoOverlays–Cont’d•  EvolvingtheFPGAdesigntools

–  Modulararchitecturesdonotleadtomodularcircuits•  Thetoolsdonotunderstandthemodularity•  Atpresentwemust“fightwiththem”[FPL2014]

–  Thetoolsmustevolvetoallowdeveloperstoexpressandtorecognizethemodularityofthearchitecture•  Scalablecircuitsfromscalablearchitectures

16-07-11 29

Page 30: Overlays: a soluon paradigm for FPGA high-level design?

ConcludingRemarks•  Acaseforoverlays

–  Performance,soOware-friendliness,customizabilityandsystemintegraCon

•  Theycanserveas“middleground”betweenhardwaredesignandsoOwareprogramming–  EitherforproducConorfordebuggingandprototyping

•  Challengestoarchitecture,programmingmodels,implementaConandresourceoverhead

16-07-11 30

Page 31: Overlays: a soluon paradigm for FPGA high-level design?

Ques-ons?

16-07-11 31