high level synthesis - universitetet i oslo · high level synthesis i generate hardware from c or...

34
INF5430: High level synthesis

Upload: others

Post on 25-Feb-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

INF5430: High level synthesis

Page 2: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Agenda

I High level Synthesis (HLS)

I Designing hardware with C

I Non-Optimal code for Synthesis

I Compiler transformations

I References

April 11, 2011 2

Page 3: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

High level Synthesis

I Generate hardware from C or another high level language.

I Faster time to market.

I Higher abstraction level (behavior).

I Several hardware implementation alternatives can begenerated from one C implementation.

I A C model can be used to generate hardware which meetdifferent performance requirements and resource constraints.

April 11, 2011 3

Page 4: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

High level Synthesis

I Open source tools and commercial tools are available:RoCCC, Catapult-C, MathWorks HDLCoder

April 11, 2011 4

Page 5: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

High level Synthesis

I HLS tools exploit parallelism in the code.

I Well suited to accelerate compute intensive kernels. This isoften smaller functions with few external data dependencies.

1 vo id f i r f i l t e r ( a c i n t<8> ∗ i nput , a c i n t<8> c o e f f s [NUM TAPS] , a c i n t<8> ∗output ) {2 s t a t i c a c i n t<8> r e g s [NUM TAPS ] ;3 i n t temp = 0 ;4 i n t i ;5 SHIFT : f o r ( i = NUM TAPS−1; i>=0; i−−) {6 i f ( i == 0 )7 r e g s [ i ] = ∗ i n pu t ;8 e l s e9 r e g s [ i ] = r e g s [ i −1];

10 }11 MAC: f o r ( i = NUM TAPS−1; i>=0; i−−) {12 temp += c o e f f s [ i ]∗ r e g s [ i ] ;13 }14 ∗output = temp>>7;15 }

April 11, 2011 5

Page 6: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Non-Optimal code for Synthesis

I Need to keep in mind that the code ends up as hardware.

I Not all algorithms are suited for hardware implementations.

I Sequential code, control logic, large loops with function callstypically will not benefit much.

April 11, 2011 6

Page 7: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Unbounded Loops

I Loops without finite bounds.

I When the start value, stop value and the increment isconstant and defined, the loop is bounded.

I A loop is not bounded when start, stop or increment is passedas a function parameter!

April 11, 2011 7

Page 8: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Unbounded Loops

I Example of unbounded loop by parameter passing.

1 vo id unbounded loop ( i n t l oop ) {2 i n t i ;3 f o r ( i = loop ; i>=0; i−−)4 s ta tement ;5 }

April 11, 2011 8

Page 9: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

System Calls and Packages

I stdio.h functions such as printf or cout.

I math.h sin, cos and most math functions are notsynthesisable.

I Assembly code is not synthesisable.

I System calls are generally not synthesisable.

April 11, 2011 9

Page 10: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Recursive functions

I Recursion are functions which calls itself.

I Used to write compact and efficient C code.

I Recursive functions executed on a CPU makes heavy use ofthe return stack, and are often unbounded.

I Typically a complete rewrite is required in order to get aiterative implementation which can be synthesised.

April 11, 2011 10

Page 11: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Function pointers

I Function pointers are usually not supported.

I Needs to be changed to explicit function calls.

1 i n t main ( ) {2 vo id (∗ f p ) ( i n t ) ;3 fp = func ;4 fp ( 2 ) ; // f u n c t i o n p o i n t e r c a l l5 func ( 2 ) ; // e x p l i c i t f u n c t i o n c a l l6 }78 vo id f unc ( i n t arg ) {9 p r i n t f ( ”%d\n” , a rg ) ;

10 }

April 11, 2011 11

Page 12: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Dynamic Memory Allocation

I Dynamic memory allocation is typically done with malloc

I Dynamic memory allocation is not synthesisable, and we needto use static memory allocations.

1 i n t s ta t i c mem [ 3 2 ] ;2 i n t ∗dyn mem = ma l l oc (32 , s i z e o f ( i n t ) ) ;

April 11, 2011 12

Page 13: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Other restrictions

I Global variables for sharing data between functions are notsupported.

I Problematic to pass pointers when using a CPU with MMU.

1 i n t g l o b v a r =5430;23 vo id f un c 0 ( ) {4 s ta tement g l o b v a r ;5 }67 vo id f un c 1 ( ) {8 s ta tement g l o b v a r ;9 }

April 11, 2011 13

Page 14: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

HLS Restrictions

I Restrictions in the HLS flow often requires rewriting Cfunctions.

I The C programming code targeting HLS is therefore lessportable.

I Different HLS tools synthesis C code differently furtherdecreasing portability.

I Programming code often require HLS tool specificadaptations.

April 11, 2011 14

Page 15: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Designing hardware with C

I We need to know how the tools transform C code intohardware in order to get efficient implementations.

I We need to know what type of code to avoid.

I HLS tools perform code transformations on the programmingcode when generating hardware implementations.

I Code transformations on different levels:I Bit-levelI Instruction-levelI Loop-levelI Data-oriented

April 11, 2011 15

Page 16: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Bit-level transformations: Bit reversing

I Software (a) implementation of bit reversing compared to ahardware implementation (b).

April 11, 2011 16

Page 17: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Bit-level transformations: Bit-width Narrowing

I Variables are often defined with a greater dynamic range thanneeded.

Consider the example

1 f o r ( i n t i =0; i <4; i++)2 s t a t emen t s

I On a CPU we use predetermined register widths.

I When implemented in hardware we allocate physical resourcesto the i variable.

I Bit-width narrowing can be determined by static analysis orprofiling.

April 11, 2011 17

Page 18: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Bit-level transformations: Bit-width Narrowing

Consider the following example where bit-width narrowing is usedto optimize the counter:

1 vo id accumulate4 ( i n t d in [ 4 ] , i n t &dout ){2 i n t acc=0;3 f o r ( i n t i =0; i <4; i++)4 acc += d in [ i ] ;5 dout = acc ;6 }

Which results in the following hardware:

April 11, 2011 18

Page 19: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Loop-level trans.: Partial Loop Unrolling

In order to increase throughput consider the example:

1 vo id accumulate ( i n t d in [ 4 ] , i n t &dout ){2 i n t acc=0;3 f o r ( i n t i =0; i <4; i +=2){4 acc += d in [ i ] ;5 acc += d in [ i +1] ;6 }7 dout = acc ;8 }

Which results in the following hardware implementation:

April 11, 2011 19

Page 20: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Loop-level trans.: Fully Unrolled Loop

Further increasing the throughput, consider the example:

1 vo id accumulate ( i n t d in [ 4 ] , i n t &dout ){2 i n t acc=0;3 acc += d in [ 0 ] ;4 acc += d in [ 1 ] ;5 acc += d in [ 2 ] ;6 acc += d in [ 3 ] ;7 dout = acc ;8 }

Results in a balanced adder tree:

April 11, 2011 20

Page 21: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Loop-level transformations

I HLS tools generate non-unrolled loops, partial unrolled loops,or fully unrolled loops out of a single implementation.

1 vo id accumulate4 ( i n t d in [ 4 ] , i n t &dout ){2 i n t acc=0;3 f o r ( i n t i =0; i <4; i++) acc += d in [ i ] ;4 dout = acc ;5 }

April 11, 2011 21

Page 22: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Instruction-level trans: Tree Height Reduction

I Reduce the height of the a tree of operations by reorderingthem without changing the functionality.

I THR is applied to (b) which results in the tree shown in (c).

I (d) and (c) shows the scheduling when the data is stored inmemory

I HLS tools will try to build a balanced tree structure out ofrelated additions that can be scheduled in parallel.

April 11, 2011 22

Page 23: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Operation Strength Reduction (OSR)

I Replace an operation by a computationally less expensive one

I or a sequence of operations Example:

1 2<<1 == 2∗2

I The multiplication is replaced with a simple shift. The shiftonly requires changes to the interconnections.

April 11, 2011 23

Page 24: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Data-Oriented Transformations

I Data-oriented transformations makes changes to theorganization of data structures.

I Common transformations include:I Data distributionI Data replication

April 11, 2011 24

Page 25: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Data distribution transformation

I Partitions the data into many distinct internal memory unitsor modules.

I Increases throughput.

I Concurrent access.

April 11, 2011 25

Page 26: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Data duplication transformation

I Increases throughput.

I Concurrent access.

I Consistency issues when modifying data.

April 11, 2011 26

Page 27: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Other transformations

I Function inlining

I Dedicated for each function invocation.

1 vo id accumulate ( ) {2 accumulate4 ( din , dout ) ;3 accumulate4 ( din2 , dout2 ) ;4 }56 vo id accumulate4 ( i n t d in [ 4 ] , i n t &dout ){7 i n t acc=0;8 f o r ( i n t i =0; i <4; i++) acc += d in [ i ] ;9 dout = acc ;

10 }

April 11, 2011 27

Page 28: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Other transformations

I Function outliningI Increases resource sharing.I Reducing parallelism.

April 11, 2011 28

Page 29: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Data Flow Graph Analysis

I High level synthesis starts by analyzing data dependencies inthe code.

I This leads to a Data Flow Graph (DFG)

I Parts of code without dependencies can be executed in parallel

1 vo id accumulate ( i n t a , i n t b , i n t c , i n t d , i n t &dout ){2 i n t t1 , t2 ;3 t1 = a + b ;4 t2 = t1 + c ;5 dout = t2 + d ;6 }

April 11, 2011 29

Page 30: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Resource Allocation

I Based on the assembled DFG, each operation is mapped ontoa hardware resource.

I This process is called resource allocation.

I The implementation is annotated with both timing and areainformation. This is used during scheduling.

April 11, 2011 30

Page 31: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Scheduling

I HLS adds time to the design during the scheduling.

I Scheduling takes the DFG operations, allocates resources andthen decides in which clock cycle they are performed.

I Registers are added based on the target clock frequency.

April 11, 2011 31

Page 32: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Scheduling

I A data path state machine (DPFSM) is created to control thescheduled design.

I In this design, four states are required to execute the schedule.

I These states are referred to as control steps or c-steps.

April 11, 2011 32

Page 33: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

Scheduling

I The resulting hardware generated from the schedule will varydepending on the design constraints.

I Design constraints include resource allocation and th amountof loop pipelining.

April 11, 2011 33

Page 34: High level synthesis - Universitetet i oslo · High level Synthesis I Generate hardware from C or another high level language. I Faster time to market. I Higher abstraction level

References

I Mentor Graphics The High Level Synthesis Blue Book

I Compiling for Reconfigurable Computing: A survey, Cardoso,Diniz, Weinhardt. DOI 10.1145/1749603.1749604

April 11, 2011 34