extracting finite state machine with datapath models …pabitra/facad/06cs6006t.pdf · extracting...
TRANSCRIPT
Extracting Finite State Machine with Datapath
models from the synthesized behavior in High Level Synthesis
A thesis submitted in partial fulfillment of the requirements
for the degree of
Master of Technology in
Computer and Information Technology
by Satish Bonagiri
(06cs6006)
Under the guidance of Dr. Dipankar Sarkar
and Dr. Chittaranjan Mandal
Dept. of Computer Science and Engineering Indian Institute of Technology
Kharagpur
May 2008
Department of Computer Science and Engineering
Indian Institute of Technology
Kharagpur, India
Certificate This is to certify that the thesis titled Extracting Finite State Machine with Datapath models from the synthesized behavior in High Level Synthesis submitted by Satish Bonagiri to the Department of Computer Science and Engineering in partial fulfillment for the award of the degree of Master of Technology is a bonafide record of work carried out by him under our supervision and guidance .The thesis has fulfilled all the requirements as per regulations of this Institute and, in our opinion, has reached the standard needed for submission.
Dr. Dipankar Sarkar Dr. Chittaranjan Mandal Dept. of Computer science and Engg Dept. of Computer science and Engg
Indian Institute of Technology Indian Institute of Technology Kharagpur 721302, INDIA Kharagpur 721302, INDIA May 2008 May 2008
Acknowledgements
This thesis is the result of research performed under the guidance of Dr.Dipankar Sarkar and Dr. Chittaranjan Mandal at the department of Computer Science and Engineering of the Indian Institute of Technology, Kharagpur.
I sincerely thank to my research advisors for having given me the opportunity of working as part of their research group and the huge amount of time and effort they spent guiding me through several difficulties on the way . Without the help, encouragement and patient support I received from my advisors, this thesis would never have materialized.
In particular, I would like to thank Chandan Karfa for his encouragement and support throughout this period.
Satish Bonagiri
Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
May 2008
Abstract
High-Level Synthesis (HLS) comprises translating a behavioral
specification into its corresponding Register Transfer Level (RTL)
specification of the system. Structured Architecture Synthesis Tool
(SAST) takes the behavioral description of an input design and
outputs the synthesizable RTL Verilog code. This work involves
enhancing the SAST by adding interfaces for the verifier.
This tool has phase-wise verification utility. The verification
mechanism is essentially equivalence checking of the input and
output of each phase of the synthesis process. For this the input and
output of every phase of the synthesis process are represented by
Finite State Machine with Data-path (FSMDs).Accordingly we added
interfaces for the FSMD model. This method is generic which can be
used for other schedulers also.
We generated FSMD after register allocation and binding
phase for verifier. FSMD formed at this phase is used for verification
of correct register sharing among the variables in the CDFG.
Key words: High-Level synthesis, FSMD, RTL, Verification
Contents
1 Introduction 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contributions of the Present Work . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Organization of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Structured Architecture Synthesis Tool (SAST) 10
2.1 Target Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Features of SAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Interfaces for Verification 15
3.1 Finite State Machine with Data-path (FSMD) . . . . . . . . . . . . . . . 15
3.2 FSMD formation from behavioral CDFG . . . . . . . . . . . . . . . . . . . 17
3.2.1 Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 FSMD formation after scheduling . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 FSMD formation after Register allocation and binding . . . . . . . . 23
3.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5.1 Data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5.2 Functional modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Experimentation and Results 29
4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Bibliography 36
List of Figures
1.1 The Y-chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 High level Synthesis (HLS) steps . . . . . . . . . . . . . . . . . . . . 6
1.3 SAST: A HLS tool with phase-wise verification utility . . 8
2.1 Architecture Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Extracted CDFG of GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 FSMD formed from extracted CDFG . . . . . . . . . . . . . . . . . 20
3.3 FSMD formed after scheduling Phase . . . . . . . . . . . . . . . . 23
3.4 FSMD formed after allocation and binding phase . . . . . 26
4.1 Control data flow graph (CDFG) of the DIFFEQ example. . 30
1
Chapter 1 Introduction 1.1 Introduction
Synthesis is the process of translating a behavioral description into a
structural description. Synthesis is also defined as the process of
interconnecting primitive components at a certain level of abstraction (target
level) to realize a specification at a higher level of abstraction (source level). The
transformation of the design at source level is carried out to achieve some
predefined performance goals or constraints .The source and target levels
categorize the various synthesis systems. Different types of synthesis process are
described by using Y-chart (shown in Figure 1.1), a tripartite representation of
design.
The axes in the Y-chart represent three different domains of description:
behavioral, structural and physical. Along each axis are different levels of the
domain of description. As we move farther away from the center of the Y, the
level of description becomes more abstract. Each concentric circle intersects the Y
axis at a particular level of representation within a domain. The circle represents
all the information known about the design at some point of time. The outer
circle is the system level, the next is register-transfer level (RTL), followed by the
logic and circuit levels. In the behavioral domain, it is concentrated on
2
Flowcharts, algorithms
Circuit syn
RT synthesis
Registers, ALU, Muxs
Transistors
Gates, flip-flops
Processors, Memories, Buses
Transistor layouts
Transistor functions
Boolean expressions
Register transfers
System synthesis
Logic synthesis
Cells
Chips
Boards, MCMs
STRUCTURAL DOMAIN BEHAVIORAL
DOMAIN
PHYSICAL DOMAIN
Figure 1.1 the Y-chart
What a design does, not in how it is built. It treat the design as one or more black
boxes with a specified set of inputs and outputs and a set of functions describing
the behavior of each output in terms of the inputs over time. A structural
domain bridges the behavioral and physical representation. It is a one-to-many
mapping of a behavioral representation onto a set of components and
connections under constraints such as cost, area, and delay .the physical domain
3
ignores, as such as possible, what the design is supposed to do and binds its
structure in one space or silicon .
Several levels of the synthesis process are system synthesis, high level
synthesis, logic synthesis and layout synthesis. System Synthesis takes the
system specification with processor, memories, etc., as input and outputs the
equivalent functional specification of the input. High level synthesis (HLS) takes
algorithmic or high level behavior as input and outputs the register transfer level
(RTL) description consisting of functional units, storage and interconnecting
units. Logic synthesis system takes Boolean equations as input and generates a
gate level design after performing logic optimizations of the input. Layout
synthesis takes the gate level specification and outputs the physical layout
implementing the gate level specification. Individual synthesis systems cater to
different constraint goal sets. Typical user constraints are area, clock speed and
power. Today’s VLSI design flow involves various levels of abstraction, allowing
us to construct large, complex systems with million transistors on a single chip.
The increased complexity of circuits at lower levels of abstraction has led to
usage of design tools at higher levels of abstraction.
Thus, we can formulate the high level synthesis problem as follows: Given
a functional specification in the form of an algorithm and a set of constraints,
synthesize an RTL equivalent of the algorithm, comprising a controller and data
path composed of modules obtained from a module library. High level synthesis
is divided into steps as depicted in the Figure 1.2, and the outputs at each step
towards RTL structural description of the input design containing functional
units, storage and interconnections. Entities and phases as shown in the Figure
1.2 are described briefly as follows.
4
• Compilation: Compilation involves translation of the design description
into an intermediate representation that is most suitable for high-level
synthesis.
• Scheduling: This phase assigns one control step to each operation in the
input design.
• Allocation: This phase computes the minimum number of functional units
and registers required to synthesize the design based on scheduling
information of the operations. This task is accomplished by using lifetime
analysis over the variables.
• Binding: This phase maps derived RTL entities (variables, operations,
transfers) to corresponding physical entities (memory, functional units,
buses).
• Control generation: This is the final step and involves the derivation of
the controller that sequences the design and controls the functional and
the storage units in the datapath.
The inputs to a high level synthesis are the behavioral specification, a module
library and the user constraints. The behavioral specification can be written in a
high level general purpose language like C or in a hardware description
language like VHDL or Verilog. The first step in high level synthesis consists of
translating a behavioral specification into its corresponding Register Transfer
Language (RTL) description. Behavioral specifications are composed by writing
code in a hardware description language such as VHDL.
5
The specification to HLS is given at a very high-level of abstraction, compared to
that of the output for the process. The vast difference in the abstraction levels of
source and target specifications makes it difficult to verify whether the
transformations performed by the HLS process are behavior-preserving.
HDL descriptions
variables to registers mapping
scheduled design
user constraints intermediate representation
functional specification
Compilation
Scheduling
Register allocation
Binding
controller generation
Module Library
RTL structural description
Figure 1.2 High Level Synthesis (HLS) steps
Therefore, phase-wise HLS verification is very useful and reliable as the design is
verified after each phase of transformations, proving the correctness of the
design at each phase. While a CDFG is better suited for the scheduling
6
algorithms, an FSMD is more appropriate model for verification. FSMD is a finite
state machine along with datapath transfers and updates made while state
transition. It is a formal way of defining a hardware design having a controller
Represented by a FSM and datapath. The verification methodology which we use
for HLS is FSMD equivalence. The verification mechanism is based on a formal
method for checking the equivalence between two designs represented as
FSMDs. Moreover as every phase of the high-level synthesis process is to be
verified, an FSMD interface is needed for this purpose. This interface extracts
FSMDs from intermediate synthesis results formed after number of
transformations used.
1.2 Contributions of the Present Work The goal of present work is related to development of a high level synthesis
system with phase-wise verification utility SAST, which takes the behavioral
description as input and produces the synthesizable RTL verilog as the output. It
takes resource library and user architectural constraints as additional inputs.
The features that are already there in the existing system are as follows.
• A GA based scheduling algorithm, which takes the behavioral description
in the form of a CDFG. Each operation in the CDFG is represented in three
address code. An operation library and the architectural constraints are
taken as additional inputs to the scheduling algorithm.
• To reduce the number of control steps and the resource requirement, a
method of handling the variables has been devised which is different from
that of handling operations.
7
• Construction of the data path and the controller for the scheduled input
design and generating the synthesizable RTL Verilog for both, the
datapath and the controller.
• A system to verify the results generated at each step of synthesis process
is implemented. It takes the two FSMDs from two steps of synthesis
process as input and finds the equivalence between them.
Figure 1.3 SAST: A HLS tool with phase-wise verification utility
To make it a full-fledged high level synthesis system with phase-verification
utility as shown in figure 1.3,enhancements and extensions have been carried
out over the existing system through the present work , these are as follows.
8
The interfaces are added at various stages of the HLS process to extract
the FSMDs required for verification. In the existing SAST there is no FSMD
construction module after register allocation and binding, FSMD formed at this
phase is used for verification of correct register sharing among the variables in
the CDFG.
1.3 Organization of thesis The thesis for the presented work is organized as follows.
• Chapter 1 is introduction, discussing about HLS, the phases involved in
HLS, need of phase-wise verification utility.
• Chapter 2 describes structured architecture synthesis tool.
• Chapter 3 presents the FSMD representation used and the various FSMD
extraction methodologies used at different phases of HLS.
• Chapter 4 has results after experimentation on different HLS benchmarks
are given in detail along with conclusion and future work.
9
Chapter 2 Structured Architecture Synthesis Tool (SAST) There are number of systems like Emerald, HAL, STAR, SPARK and GABIND
are now available for high-level synthesis (HLS) of digital systems. All the above
systems try to produce the optimal or near optimal design using different
algorithms for scheduling, register allocation and for optimization
.programmable devices tend to have limited wiring resources and so it is
desirable that designs implemented on such devices have a simpler modular
layout avoiding long distance interconnects. The aim of the system is to produce
designs with a simple and predictable layout structure, thereby conserving on-
chip wiring resources upon implementation. In a structured architecture (SA) has
been proposed and HLS synthesis tool called Structured Architecture Synthesis
Tool (SAST) comprising primarily a scheduler has been built. The promise of a
simplified layout structure makes the architecture attractive for the high-level
synthesis of designs which are intended to be implemented on reconfigurable
architectures and programmable structures such as FPGAs. In the present work
this existing SAST is enhanced by incorporating some more phases and
interfaces for verification.
10
2.1 Target Architecture Structured Architecture Synthesis Tool (SAST) essentially takes the behavioral
description of an input design in the form of three address code, and outputs the
synthesizable RTL Verilog code. The generated data path is organized as
architectural blocks (A-block). Each A-block has a local functional unit (FU), local
storage and local buses (also called as access links). All the A-blocks in a design
are interconnected by a number of global buses. Other than the local memories in
all A-blocks, SAST also permits the use of global memories as architectural
components. These memories are similar to an A-block, except that it does not
contain any functional unit in it. These memories can be accessed globally by all
the A-blocks. These external memories are connected to all A-blocks by global
buses. The global memory units in the structured architecture play an important
role as a convenient interface for the system. While it may be difficult to initialize
a specific storage location within an A-block, it is considerably easier to store
initial operands and retrieve final results from the global memory units. Global
memories help improve the availability of operands and relieve the storage
requirement in individual A-blocks. All the data path components are of the
same width. That is, the local buses, storage units, functional units in the A-
blocks and the global buses have same width.
There are input/output ports that are connected to global buses, so that all
the A-blocks can access any of the ports. Each A-block has local memory as
register bank, which are connected to global buses through internal buses (access
links). And each A-block has one functional unit (FU), which takes input from
either local memory, or from internal buses. The output from the functional unit
sends back either to the register bank or to internal buses. Switches are there in
the design to enable/disable the connection between any two components in the
A-block. The group of the switches, which connects the internal buses and the
output of FU to the input ports of registers, are called as in-switches. The group of
11
switches, which connects the output of registers and output of FU to internal
buses, are called as out-switches. Global buses are connected to input ports of FU
through internal buses and in-switches. Output port of the FU is connected to the
global buses through internal buses and out-switches. The schematic diagram of
an A-block is shown in the Figure 2.1.
-- Hard connection -- Switch
Figure 2.1 Architectural Block
The structure of the data path is characterized by a set of architectural constraints
like the number of A-blocks, the number of global memories, the number of
global buses interconnecting the A-blocks, the number of access links or access
width connecting an A-block to the global buses and the maximum number of
writes per time step to storage locations in an A-block. The architectural
12
parameters which are internal to an A-block (e.g. number of accesslinks and
number of write ports to internal memory, etc.) are same to each A-block. These
structured data paths avoid random interconnects between data path elements.
Each A-block has a simple implementation. This makes the generated design
easy to implement on programmable devices such as FPGAs.
2.2 Features of SAST Reduction in Interconnection Cost: There are many high level synthesis tools
currently available in the market. But all of the present tools try to produce the
optimal RTL with the random interconnections among the data path components
(e.g. Muxes, ALUs, etc.), which raise the interconnection cost while fabricating
the design. Field programmable gate arrays are naturally attractive for
prototyping the designs generated by high level synthesis. Programmable
devices tend to have limited wiring resources between the data path elements so
the designs implementing on such devices required avoiding the long-distance
interconnections. We used a structured architecture for HLS which produces the
predictable interconnections among the data path components. This causes low
interconnection cost in the design.
Scheduling: SAST uses genetic algorithm based scheduler for scheduling the input
design. SAST supports both time constrained and resource constrained
scheduling. Resource information and the maximum no .of control steps the
scheduler can take to schedule for each basic block are provided to the scheduler
as input constraints. It also handles multi-cycle and pipelined functional units.
Register allocation and binding: After scheduling is completed, the next step is live-
variable analysis and register allocation for each A-block. SAST uses minimum
number of registers to store the intermediate values in the design.
13
Data path Generation: Data path for each A-block consists of functional unit, a
register bank and access links, which connects A-blocks to the global buses. SAST
uses structured architecture (SA) in the datapath, which reduces the
interconnection length between the data path components. SAST uses minimum
number of buses to schedule the input design.
RTL Generation: The final output from SAST is the RTL description in Verilog. It
generates the synthesizable verilog code for both the data path and the control
path.
Verification: Our objective is to verify the synthesis results at each step of HLS.
Results at each step of HLS for SAST are reported in a finite state machine with
data path (FSMD) model. Reports are generated after scheduling, allocation and
binding steps, for verification of the synthesis results.
Summary In this chapter we have explained the final target architecture used by the SAST.
We have described the importance of global memories used in the design. We
have also explained the architecture of an architectural block with functional
units, storage units and interconnection between them. Finally, we presented the
features of SAST.
14
Chapter 3
Interfaces for verification HLS can be seen as stepwise transformation of behavioral specification into a
structural implementation .Our objective is to verify the synthesis results at each
step of HLS. The intermediate results produced after each step is modeled by an
automaton. Each transformed automaton can be represented as a FSMD. Reports
are generated at each phase of the synthesis process for verification. The FSMDs
can be of four types namely behavior based, scheduled CDFG, one after register
allocation and binding and the last one after controller generation phase. Section
3.1 explains the modeling of the reports generated to verify the synthesis results.
Section 3.2 describes about FSMD generation from behavioral CDFG extracted
from VHDL specification. Section 3.3 describes about scheduling results and
their FSMD generation using model in 3.2. Section 3.4 discuss about FSMD
formation after register allocation and binding phase.
3.1 Finite State Machine with Data path (FSMD) An FSMD (finite state machine with data-path) is a universal specification model,
proposed by Gajski that can represent all hardware designs. An FSM (finite state
machine) model works well for up to several hundred states. Beyond that, the
model becomes incomprehensible to human designers. To adapt the FSM model
for more complex designs, FSMD was introduced by Gajski. Each storage
elements like the register is replaced by a variable in the FSMD. So, each variable
replaces thousands of different states. For example, a 16-bit register represents
15
216 different states in an FSM; thus, introduction of a 16-bit variable reduces the
number of states in the FSM model by 216. The use of variables leads to the
concept of an FSM with a datapath (FSMD).
The model is used in the present work with the addition of a reset state, for
encoding the specification and implementation of the circuit to be verified. This
reset state is also called the start state of the FSMD.
Definition: The FSMD is defined as an ordered tuple < Q, q0, I, V, O, f, h>,
Where
1. Q = { q0, q1 , q2, . . . qn } is the finite set of control states,
2. q0 Є Q is the reset state,
3. I is the set of primary input signals and ΣI is the input alphabet,
4. V is the set of storage variables and Σ is the set of all data storage states
or simply, data states,
5. O is the set of primary output signals and Σo is the output alphabet,
6. f : Q Χ 2S Q,, is the state transition function and
7.h: Q Χ 2S U,is the update function of the output and the storage
variables, where U and S are as defined below.
(a) U = {x e | x Є OUV and e Є E} represents a set of storage or output
assignments, where , E = {g(x,y,z,…)/x,y,z,……Є I U V} represents a set of
arithmetic expressions over the set I U V of input and storage variables.
(b) S = {R(e)|e Є E and R is any arithmetic relation} represents a set of
status expressions over I U V ,R Є (= 0 ,≠ 0 , > 0 , ≥ 0 , < 0 , ≤ 0). Thus, the next (control and data) state and output depend not only on the
present state and the input signals but also on the conjunctions (internal) status
16
expressions that indicate whether a predicate holds on the data state of the
storage and the input variables. Since, state transitions and updates have been
represented as functions; an FSMD model is inherently deterministic.
3.2 FSMD formation from behavioral CDFG
This the initial FSMD formed from the CDFG extracted from the
behavioral VHDL code. This FSMD has a one-to-one mapping with the extracted
CDFG. This FSMD resembles the data and control flow compatible to the
extracted CDFG from the behavioral specification. Listed below are the steps
involved in forming this initial FSMD in form of pseudo-code.
3.2.1 Methodology: formFSMDbeh( CDFGbeh )
While ( CDFGbeh)
do
if the currBLK is a BASIC block
do
construct a DFG o f the instructions within the block
create new states in FSMD corresponding
…to nodes of the DFG
update the transition data o f the FSMD
…representing data - flow
end do
if the currBLK is a CONTROL block
do
create a new state and the outward links
update the transition data of the
17
…FSMD representing control f low
end do
DFS( CDFGbeh)
end do
In the pseudo-code, CDFGbeh refers to the extracted CDFG from VHDL code.
Whenever a basic block in encountered in the CDFG, a data-flow graph (DFG) is
formed of the 3-addr instructions of the basic block. The edges here represent
purely control flow. Figure 3.2 shows FSMD formed from extracted CDFG for
greatest common divisor (GCD) benchmark. The transition edges show Rα/rα,
where Rα is the control information and rα is the data updates taking place in
transition.
Figure 3.1 shows the extracted CDFG from the VHDL code for greatest
common divisor (GCD) benchmark.
18
y1 ==y2
y1 ==y2
y1= y1 – y2
y1= y1 – y2
read(p0,y1) read(p1,y2)
read(p0,y1) read(p1,y2)
Figure 3.1 Extracted CDFG of GCD
19
q00
q01
q03 q02
q0e q05 q04
- / y1= p0, y2=p1
! y1= = y2 /- y1= = y2 /-
! y1 > y2 y1 > y2 0
Figure 3.2 FSMD formed from extracted CDFG
20
3.3 FSMD formation after scheduling The scheduling phase of HLS may result in movement of operations into
different control steps as per some heuristic followed. Thus, scheduling groups
operations into a set that can be represented by a state in the FSMD of the
scheduled CDFG. The FSMD formed after scheduling in SAST depends on two
things. First is the control-flow information between the blocks which can be
inherited from the initial CDFG extracted from VHDL. Second is the control
steps assigned to operations within the block after scheduling. FSMD after
scheduling is direct-mapping of these two informations.
A basic block i in the scheduled CDFG is scheduled in control steps steps.
Each operation j in the basic block i, Oij is given as,
f step : Oij → [n1 , n2]: n1 ≤ n2 , 1≤ n1 ≤steps , 1≤ n2 ≤steps
f step is a function which maps each operation Oij to a range of control steps [n1 , n2] ,where - n1 is the start time of operation Oij , and - n2 is the end time of operation Oij.
n1, n2 are the control steps relative to the beginning of the basic block i .
3.3.1 Methodology:
The steps involved in the FSMD formation post scheduling are mentioned in form of pseudo-code.
21
formFSMDschd( CFGfrmCDFG, Control - Steps )
do
for each block visited in CFGfrmCDFG
do
assign(hence create) a new state for each Control - Step
assigned to the block
update the transition information depicting
…the control and data flow
end do
end do
CFGfrmCDFG is the control information of the blocks extracted from the initial
CDFG which remains preserved and Control - steps is the scheduled information
for the design (it has separate information for each block) i.e. in each block which
operation is scheduled in a control step. In the method mentioned as each block
is traversed, the FSMD states are formed from the block's Control-Step
information. The FSMD formed after scheduling (GCD) is shown in figure.
22
-/p0=z
! y1 ==y2 && y1>y2 / y1 = y1-y2
y1= = y2/-
y1 = = y2/z=y1
q2
-/ y1 = p0, y2=p1 ! y1 ==y2 && ! y1>y2 / y2=y2-y1
! y1 ==y2 && y1 >y2/-
q0
q1
qe q4
q3
Figure 3.3 FSMD formed after scheduling Phase 3.4 FSMD formation after Register Allocation and binding
We have started formation of FSMD from register allocation and binding
results. In a typical HLS process after Allocation and Binding, data path of the
input specification is formed. Data path consists of three register transfer logic
(RTL) components: functional, storage and interconnections. The input variable
or signal gets mapped to a storage units in the data path namely register.
Functional units perform the operations assigned to it in each control step by its
23
composition. Data transfers between the functional and interconnection units in
the data path come from or go to storage units.
Moreover the same register can be used to represent different variables,
provided they are live at non-overlapping intervals. FSMD formed after register
allocation and binding is a mapping of register names to the variables and signal
names. FSMD formed at this phase is used for verification of correct register
sharing among the variables in the CDFG. FSMD formation at this stage is
forming a function which has as inputs the scheduled CDFG and the register
liveness information (intervals in which variables to certain register) or register
sharing information.
3.4.1 Methodology:
In SAST we have two types of variables involved in our data path namely
permanent and temporary. Permanent variables are one which binded
permanently to a register in a Alu-block for their entire lifetime. Temporary
variables on the other hand can be binded to more than one register in different
Alu-blocks in their lifetime, as they are required as operands to some functional
operation scheduled in other Alu-block. So firstly we extract the lifetimes of
variables with their span registers along with the control step in which they bind
to the register on a per block basis. Example of such extracted register
Lifetime information for GCD is: NV q-1 q-1 R1
y1 q0 q1 R0
y1 q-1 q2 R0
y1 q0 q3 R0
y1 q1 q3 R0
y1 q3 q4 R0
y1 q2 q5 R0
24
y1 q3 q5 R0
y2 q0 q1 R1
y2 q-1 q2 R1
y2 q0 q3 R1
y2 q1 q4 R1
y2 q2 q4 R1
y2 q4 q5 R1
z q5 q6 R1
After extracting the variable-register binding information, we traverse the
already formed scheduled FSMD and map the variables with their register
counterparts depending upon their states or control states. This forms the FSMD
with variables mapped to registers and the 2 FSMDs namely scheduled and this
one are used for verifying correct register sharing. The FSMD formed after
scheduling (GCD) is shown in Figure 3.1 and Figure 3.2 shows FSMD formed
after allocation and binding for GCD.
25
-/ r0 = p0, r1 = p1 ! r0 == r1 && ! r0> r1 / r1=r1—r0
! r0 ==r1 && r0>r1 / r0 = r0 – r1
! r0 ==r1 && r0 > r1/-
r0 = = r1/-
r0 = = r1/z=r0
-/p0=r1
q0
q1
q2 q3
qe q4
Figure 3.4 FSMD formed after allocation and binding phase
26
3.5 Implementation
This section gives details of data structures being used to represent the
FSMD and the modules involved in the process.
3.5.1 Data Structures:
To model the FSMD in a data structure, we need to have a list of states.
For each such states, the number of outgoing transitions along with the
condition, data transfers and the next state information on each outgoing
transition from each state. The following data structure models the necessary for
FSMD just as described.
This structure contains the fsmd STATE information struct fsmdSTATE { int st; int out; struct LINK *tran; struct STATE *next; } Structure fsmd STATE contains the information about each state in the FSMD.
The fields in the FSMD are explained below.
st: state number in the FSMD,
out: number of outgoing transitions for a state,
tran: pointer to the list of outgoing transitions from this state , and
next: pointer to next state in the list of states.
27
Struct LINK { char cond[100]; char act[100]; int st; struct LINK *next; } Structure LINK contains the outgoing transitions list for a state.
The fields in LINK are explained below.
cond : condition of execution of the outgoing transition ,
act : data transfer operations on the transition ,
st : next state information ,and
next : pointer to the next outgoing transition in the list .
3.5.2 Functional Modules:
Top level functional modules which achieve the construction of FSMD model
are as follows.
- Find_state: computes the starting state of each basic block in the scheduled
CDFG from the scheduled information by SAST. It also appends the
condition of execution to each basic block when control going to logical
successor blocks.
- BehReport: takes the CDFG extracted from VHDL source as input and
returns pointer to FSMD consisting of behavioral specifications.
- Schedule Report: takes scheduled CDFG as input and returns pointer to
FSMD consists of scheduling results in FSMD.
- RegAllocReport: takes scheduled CDFG, and register timespan as input and
returns FSMD consisting the register mapping to variable.
28
Chapter 4 Experimentation and Results:
4.1 results
The generation of FSMD after register allocation and binding phase was
implemented and tested successfully with various HLS benchmarks which are
both control intensive and data-intensive. The following benchmarks have been
used for experimentation. The input for this FSMD are scheduled CDFG, and
register timespans.
DIFFEQ: differential equation solver used by Gajski as well as Paulin et al
to illustrate their synthesis algorithms. It includes control constructs conditional
and looping statements.
GCD: computes greatest common divisor for the given two numbers X, Y
It includes control constructs involving conditional and looping statements.
DCT: direct cosine transform is a benchmark which is very data sensitive.
It has many data related operations involved.
The detailed results of the DIFFEQ are as follows.
29
Differential equation solver
Differential equation solver (DIFFEQ) is used to solve a system of first order
differential equations. This is a benchmark problem in high-level synthesis.
Its algorithmic description consists of conditional expressions and loops. Its
Derived CDFG representation from the algorithmic description is depicted in
the following Figure.
I
B1
C1
B2
T F
Figure 4.1: Control data flow graph (CDFG) of the DIFFEQ example
30
VHDL behavioral specification of DIFFEQ: entity diffeq is
port (dx, u, y, x, a: in bit;
x1, u1, y1: out bit);
end diffeq;
architecture behav of diffeq is
begin
process(x, y, u)
variable v0, v1, v2, v3, v4, v5, v6 : int ;
begin
while( x < a )
loop
v0 := u*dx ;
v1 := 3*x ;
x := x+dx ;
v2 := v0*v1 ;
v3 := 3*y ;
v4 := u-v2 ;
v5 := dx*v3 ;
v6 := u*dx ;
u := v4-v5 ;
y := y+v6 ;
end loop;
x1 <= x;
u1 <= u;
y1 <= y;
end process;
end behav;
The CDFG extracted by the translation scheme is as follows. It has 4 basic blocks.
4
B0 5
read (p0 , dx )
read (p1 , u )
read (p2 , y )
31
read (p3 , x )
read (p4 , a )
C0 1
x < a
B1 10
v0 = u * dx
v1 = 3 * x
x = x + dx
v2 = v0 * v1
v3 = 3 * y
v4 = u - v2
v5 = dx * v3
v6 = u * dx
u = v4 - v5
y = y + v6
B2 6
x1 = x
write (p2 , x1 )
u1 = u
write (p1 , u1 )
y1 = y
write (p0 , y1 )
4
B0 1 C0
C0 2 0 B1 1 B2
B1 1 C0
B2 0
FSMD formed from CDFG: The FSMD after this stage is direct mapping of the
CDFG extracted to a FSMD.
diffeq
q00 1 - / dx = p0, u = p1, y = p2, x = p3, a = p4 q01
q01 2 !x < a / - q02 x < a / - q03
32
q02 1 - / v0 = u * dx , v1 = 3 * x , x = x + dx , v3 = 3 * y , v6 = u * dx , q04
q04 1 - / v2 = v0 * v1 , v5 = dx * v3 , y = y + v6 , q05
q05 1 - / v4 = u - v2 , q06
q06 1 - / u = v4 - v5 , q01
q03 1 - / x1 = x , u1 = u , y1 = y , q07
q07 1 - / p2 = x1 , p1 = u1 , p0 = y1 , q08
q08 0
FSMD formed after scheduling: DIFFEQ is scheduled over 3 A-blocks with 2
global buses and 1 access link (access width) per each A-block. Different types of
operations present in the DIFFEQ are assignment, addition, condition, loop and
multiplication. Assignment and addition operations take single control step to
execute. Scheduled FSMD output of DIFFEQ is shown below.
q0 1 - / dx = p0, y = p2 q1
q1 1 - / x = p3, a = p4 q2
q2 1 - / u = p1 q3
q3 2 !x < a / v0 = u * dx,v1 = 3 * x,v3 = 3 * y q4 x < a / - q9
q4 1 - / - q5
q5 1 - / v2 = v0 * v1,v5 = dx * v3,v6 = u * dx q6
q6 1 - / - q7
q7 1 - / v4 = u - v2,y = y + v6 q8
q8 1 - / x = x + dx,u = v4 - v5 q3
q9 1 x < a / x1 = x,y1 = y q10
q10 1 - / p2 = x1,u1 = u q11
q11 1 - / p1 = u1, p0 = y1 q12
q12 0
DIFFEQ is scheduled in 13 control steps.
FSMD formed after register allocation and binding: All the variables in the
DIFFEQ are a mixture of temporary variables and program variable . The FSMD
extracted after register allocation and binding is
33
q0 1 - / r0 = p0, r3 = p2 q1
q1 1 - / r10 = p3, r4 = p4 q2
q2 1 - / r5 = p1 q3
q3 2 !r10 < r4 / r1 = r1 * r8,r2 = 3 * r10,r7 = 3 * r12 q4 r10 < r4 / - q9
q4 1 - / - q5
q5 1 - / r9 = r1 * r2,r7 = r8 * r7,r12 = r12 * r8 q6
q6 1 - / - q7
q7 1 - / r9 = r5 - r9,r3 = r12 + r12 q8
q8 1 - / r10 = r10 + r13,r5 = r9 - r7 q3
q9 1 r10 < r4 / r1 = r10 ,r9 = r3 q10
q10 1 - / p2 = r1,r12 = r5 q11
q11 1 - / p1 = r12, p0 = r9 q12
q12 0
Variable mapping to registers in all A-blocks is shown below.
-------- A-Block 0 --------
Number of registers 3
r_dx
r_u_v0_x1
r_v1
-------- A-Block 1 --------
Number of registers 7
r_y
r_a
r_u
r_3
r_v3_v5
r_dx
r_v2_v4_y1
-------- A-Block 2 --------
Number of registers 4
r_x
r_3
r_u_v6_y_u1
r_dx
34
4.2 Conclusions: These works is concerned with the development of a HLS tool, for
synthesizing structured architectures with a simple and predictable layout
structure and generate synthesizable RTL codes from VHDL behavioral
specification.
In this work we presented FSMD formation methodologies for phase-wise
verification of results which are used to translate the CDFG information used by
HLS into FSMD for verification. These methodologies work on different phases
of HLS and interface the HLS tool with a paralley developed verification tool.
We completed the generation of FSMD after register allocation and binding.
4.3 Future work SAST takes the VHDL behavioral description and produces RTL description
in synthesizable Verilog. To make this more effective and efficient following
enhancements can be done.
• Register Interconnection Optimization reduces the number of inter-
connection switches by using an optimization method to map variables to
registers. It also optimizes control signal count.
• Compiler optimizations like common sub expression elimination, constant
propagation and tree balancing etc can be included in the translation
methodology or preprocessor.
35
Bibliography [1] Daniel D.Gajski, Nikil D.Dutt , Allen C-H Wu, and Steve Y-L Lin , High level synthesis : Introduction to chip and System Design ,Kluwer Academic Publishers , 1992. [2] C.R.Mandal ,P.P.Chakrabarti , and S.Ghose , Gabind : a ga approach to allocation and binding for the high-level synthesis of data paths, IEEE Transactions on Very Large Scale Integration (VLSI) Systems , vol. 8 ,no. 6 ,pp 747-750 ,2000. [3] C.R Mandal, R.M. Zimmer, A Genetic Algorithm for Synthesis of Structured Data Paths, Proceedings of the 13th International Conference on VLSI Design, 2006 [4] C.R.Mandal , P.P. Chakrabarti , and S.Ghose , Allocation and binding for data path synthesis using a genetic approach , in proceedings of VLSI design ’96, pp.122-125 ,1996. [5] Ramachandan , N.Gajski , D.D Chaiyakul , An algorithm for array variable clustering , in proceedings of EUROASIC , The European Event in ASIC Design on European Design and Test Conference ,1994 . [6] M. Rahmouni and A. A. Jerraya, “Formulation and evaluation of scheduling techniques for control flow graphs”, in Proceedings of EuroDAC'95, (Brighton), pp. 386.391, 18-22 September 1995. [7] C.Tseng and D.P Siewiorek, FACET: A procedure for the Automated Synthesis of Digital Systems, 20th Design Automation Conference, 1983. [8] Holmes, N.D. Gajski , D.D Architectural exploration for data paths with memory hierarchy , in Proceedings of ED & TC on European Design and Test Conference,1995. [9] Herman Schmit, Donald E. Thomas, Synthesis of application-specific memory designs, in proceedings of IEEE Transactions on VLSI Systems, 1997.
36
[10] Peeter Ellervee ,Ahmed Hemani , Bengt Sventesson , High level Synthesis of Control and Memory Intensive Applications , in Proceedings of IEEE International Conference 1995. [11] D. Gajski and L. Ramachandran, “Introduction to high-level synthesis,” IEEE transactions on Design and Test of Computers, pp. 44–54, 1994. [12] N.-S. Woo, “A global, dynamic register allocation and binding for data path synthesis system,” in Procs. of 27th DAC, pp. 505–510, 1990. [13] C. Blank, “Formal verification of register binding,” in Procs. of Workshop on Advances in Verification (WAVE) 2000, 2000. [14] C. Karfa, C. Mandal, D. Sarkar, S. Pentakota, and C. Reade, “A formal verification method of scheduling in high-level synthesis,” in In Proc. ISQED ’06, pp. 71–78, March 2006.
37