extracting finite state machine with datapath models …pabitra/facad/06cs6006t.pdf · extracting...

Extracting Finite State Machine with Datapath

models from the synthesized behavior in High Level Synthesis

A thesis submitted in partial fulfillment of the requirements

for the degree of

Master of Technology in

Computer and Information Technology

by Satish Bonagiri

(06cs6006)

Under the guidance of Dr. Dipankar Sarkar

and Dr. Chittaranjan Mandal

Dept. of Computer Science and Engineering Indian Institute of Technology

Kharagpur

May 2008

Department of Computer Science and Engineering

Indian Institute of Technology

Kharagpur, India

Certificate This is to certify that the thesis titled Extracting Finite State Machine with Datapath models from the synthesized behavior in High Level Synthesis submitted by Satish Bonagiri to the Department of Computer Science and Engineering in partial fulfillment for the award of the degree of Master of Technology is a bonafide record of work carried out by him under our supervision and guidance .The thesis has fulfilled all the requirements as per regulations of this Institute and, in our opinion, has reached the standard needed for submission.

Dr. Dipankar Sarkar Dr. Chittaranjan Mandal Dept. of Computer science and Engg Dept. of Computer science and Engg

Indian Institute of Technology Indian Institute of Technology Kharagpur 721302, INDIA Kharagpur 721302, INDIA May 2008 May 2008

Acknowledgements

This thesis is the result of research performed under the guidance of Dr.Dipankar Sarkar and Dr. Chittaranjan Mandal at the department of Computer Science and Engineering of the Indian Institute of Technology, Kharagpur.

I sincerely thank to my research advisors for having given me the opportunity of working as part of their research group and the huge amount of time and effort they spent guiding me through several difficulties on the way . Without the help, encouragement and patient support I received from my advisors, this thesis would never have materialized.

In particular, I would like to thank Chandan Karfa for his encouragement and support throughout this period.

Satish Bonagiri

Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

May 2008

Abstract

High-Level Synthesis (HLS) comprises translating a behavioral

specification into its corresponding Register Transfer Level (RTL)

specification of the system. Structured Architecture Synthesis Tool

(SAST) takes the behavioral description of an input design and

outputs the synthesizable RTL Verilog code. This work involves

enhancing the SAST by adding interfaces for the verifier.

This tool has phase-wise verification utility. The verification

mechanism is essentially equivalence checking of the input and

output of each phase of the synthesis process. For this the input and

output of every phase of the synthesis process are represented by

Finite State Machine with Data-path (FSMDs).Accordingly we added

interfaces for the FSMD model. This method is generic which can be

used for other schedulers also.

We generated FSMD after register allocation and binding

phase for verifier. FSMD formed at this phase is used for verification

of correct register sharing among the variables in the CDFG.

Key words: High-Level synthesis, FSMD, RTL, Verification

Contents

1 Introduction 2

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Contributions of the Present Work . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Organization of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Structured Architecture Synthesis Tool (SAST) 10

2.1 Target Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Features of SAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Interfaces for Verification 15

3.1 Finite State Machine with Data-path (FSMD) . . . . . . . . . . . . . . . 15

3.2 FSMD formation from behavioral CDFG . . . . . . . . . . . . . . . . . . . 17

3.2.1 Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 FSMD formation after scheduling . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.1 Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 FSMD formation after Register allocation and binding . . . . . . . . 23

3.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5.1 Data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5.2 Functional modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Experimentation and Results 29

4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Bibliography 36

List of Figures

1.1 The Y-chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 High level Synthesis (HLS) steps . . . . . . . . . . . . . . . . . . . . 6

1.3 SAST: A HLS tool with phase-wise verification utility . . 8

2.1 Architecture Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1 Extracted CDFG of GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 FSMD formed from extracted CDFG . . . . . . . . . . . . . . . . . 20

3.3 FSMD formed after scheduling Phase . . . . . . . . . . . . . . . . 23

3.4 FSMD formed after allocation and binding phase . . . . . 26

4.1 Control data flow graph (CDFG) of the DIFFEQ example. . 30

1

Chapter 1 Introduction 1.1 Introduction

Synthesis is the process of translating a behavioral description into a

structural description. Synthesis is also defined as the process of

interconnecting primitive components at a certain level of abstraction (target

level) to realize a specification at a higher level of abstraction (source level). The

transformation of the design at source level is carried out to achieve some

predefined performance goals or constraints .The source and target levels

categorize the various synthesis systems. Different types of synthesis process are

described by using Y-chart (shown in Figure 1.1), a tripartite representation of

design.

The axes in the Y-chart represent three different domains of description:

behavioral, structural and physical. Along each axis are different levels of the

domain of description. As we move farther away from the center of the Y, the

level of description becomes more abstract. Each concentric circle intersects the Y

axis at a particular level of representation within a domain. The circle represents

all the information known about the design at some point of time. The outer

circle is the system level, the next is register-transfer level (RTL), followed by the

logic and circuit levels. In the behavioral domain, it is concentrated on

2

Flowcharts, algorithms

Circuit syn

RT synthesis

Registers, ALU, Muxs

Transistors

Gates, flip-flops

Processors, Memories, Buses

Transistor layouts

Transistor functions

Boolean expressions

Register transfers

System synthesis

Logic synthesis

Cells

Chips

Boards, MCMs

STRUCTURAL DOMAIN BEHAVIORAL

DOMAIN

PHYSICAL DOMAIN

Figure 1.1 the Y-chart

What a design does, not in how it is built. It treat the design as one or more black

boxes with a specified set of inputs and outputs and a set of functions describing

the behavior of each output in terms of the inputs over time. A structural

domain bridges the behavioral and physical representation. It is a one-to-many

mapping of a behavioral representation onto a set of components and

connections under constraints such as cost, area, and delay .the physical domain

3

ignores, as such as possible, what the design is supposed to do and binds its

structure in one space or silicon .

Several levels of the synthesis process are system synthesis, high level

synthesis, logic synthesis and layout synthesis. System Synthesis takes the

system specification with processor, memories, etc., as input and outputs the

equivalent functional specification of the input. High level synthesis (HLS) takes

algorithmic or high level behavior as input and outputs the register transfer level

(RTL) description consisting of functional units, storage and interconnecting

units. Logic synthesis system takes Boolean equations as input and generates a

gate level design after performing logic optimizations of the input. Layout

synthesis takes the gate level specification and outputs the physical layout

implementing the gate level specification. Individual synthesis systems cater to

different constraint goal sets. Typical user constraints are area, clock speed and

power. Today’s VLSI design flow involves various levels of abstraction, allowing

us to construct large, complex systems with million transistors on a single chip.

The increased complexity of circuits at lower levels of abstraction has led to

usage of design tools at higher levels of abstraction.

Thus, we can formulate the high level synthesis problem as follows: Given

a functional specification in the form of an algorithm and a set of constraints,

synthesize an RTL equivalent of the algorithm, comprising a controller and data

path composed of modules obtained from a module library. High level synthesis

is divided into steps as depicted in the Figure 1.2, and the outputs at each step

towards RTL structural description of the input design containing functional

units, storage and interconnections. Entities and phases as shown in the Figure

1.2 are described briefly as follows.

4

• Compilation: Compilation involves translation of the design description

into an intermediate representation that is most suitable for high-level

synthesis.

• Scheduling: This phase assigns one control step to each operation in the

input design.

• Allocation: This phase computes the minimum number of functional units

and registers required to synthesize the design based on scheduling

information of the operations. This task is accomplished by using lifetime

analysis over the variables.

• Binding: This phase maps derived RTL entities (variables, operations,

transfers) to corresponding physical entities (memory, functional units,

buses).

• Control generation: This is the final step and involves the derivation of

the controller that sequences the design and controls the functional and

the storage units in the datapath.

The inputs to a high level synthesis are the behavioral specification, a module

library and the user constraints. The behavioral specification can be written in a

high level general purpose language like C or in a hardware description

language like VHDL or Verilog. The first step in high level synthesis consists of

translating a behavioral specification into its corresponding Register Transfer

Language (RTL) description. Behavioral specifications are composed by writing

code in a hardware description language such as VHDL.

5

The specification to HLS is given at a very high-level of abstraction, compared to

that of the output for the process. The vast difference in the abstraction levels of

source and target specifications makes it difficult to verify whether the

transformations performed by the HLS process are behavior-preserving.

HDL descriptions

variables to registers mapping

scheduled design

user constraints intermediate representation

functional specification

Compilation

Scheduling

Register allocation

Binding

controller generation

Module Library

RTL structural description

Figure 1.2 High Level Synthesis (HLS) steps

Therefore, phase-wise HLS verification is very useful and reliable as the design is

verified after each phase of transformations, proving the correctness of the

design at each phase. While a CDFG is better suited for the scheduling

6

algorithms, an FSMD is more appropriate model for verification. FSMD is a finite

state machine along with datapath transfers and updates made while state

transition. It is a formal way of defining a hardware design having a controller

Represented by a FSM and datapath. The verification methodology which we use

for HLS is FSMD equivalence. The verification mechanism is based on a formal

method for checking the equivalence between two designs represented as

FSMDs. Moreover as every phase of the high-level synthesis process is to be

verified, an FSMD interface is needed for this purpose. This interface extracts

FSMDs from intermediate synthesis results formed after number of

transformations used.

1.2 Contributions of the Present Work The goal of present work is related to development of a high level synthesis

system with phase-wise verification utility SAST, which takes the behavioral

description as input and produces the synthesizable RTL verilog as the output. It

takes resource library and user architectural constraints as additional inputs.

The features that are already there in the existing system are as follows.

• A GA based scheduling algorithm, which takes the behavioral description

in the form of a CDFG. Each operation in the CDFG is represented in three

address code. An operation library and the architectural constraints are

taken as additional inputs to the scheduling algorithm.

• To reduce the number of control steps and the resource requirement, a

method of handling the variables has been devised which is different from

that of handling operations.

7

• Construction of the data path and the controller for the scheduled input

design and generating the synthesizable RTL Verilog for both, the

datapath and the controller.

• A system to verify the results generated at each step of synthesis process

is implemented. It takes the two FSMDs from two steps of synthesis

process as input and finds the equivalence between them.

Figure 1.3 SAST: A HLS tool with phase-wise verification utility

To make it a full-fledged high level synthesis system with phase-verification

utility as shown in figure 1.3,enhancements and extensions have been carried

out over the existing system through the present work , these are as follows.

8

The interfaces are added at various stages of the HLS process to extract

the FSMDs required for verification. In the existing SAST there is no FSMD

construction module after register allocation and binding, FSMD formed at this

phase is used for verification of correct register sharing among the variables in

the CDFG.

1.3 Organization of thesis The thesis for the presented work is organized as follows.

• Chapter 1 is introduction, discussing about HLS, the phases involved in

HLS, need of phase-wise verification utility.

• Chapter 2 describes structured architecture synthesis tool.

• Chapter 3 presents the FSMD representation used and the various FSMD

extraction methodologies used at different phases of HLS.

• Chapter 4 has results after experimentation on different HLS benchmarks

are given in detail along with conclusion and future work.

9

Chapter 2 Structured Architecture Synthesis Tool (SAST) There are number of systems like Emerald, HAL, STAR, SPARK and GABIND

are now available for high-level synthesis (HLS) of digital systems. All the above

systems try to produce the optimal or near optimal design using different

algorithms for scheduling, register allocation and for optimization

.programmable devices tend to have limited wiring resources and so it is

desirable that designs implemented on such devices have a simpler modular

layout avoiding long distance interconnects. The aim of the system is to produce

designs with a simple and predictable layout structure, thereby conserving on-

chip wiring resources upon implementation. In a structured architecture (SA) has

been proposed and HLS synthesis tool called Structured Architecture Synthesis

Tool (SAST) comprising primarily a scheduler has been built. The promise of a

simplified layout structure makes the architecture attractive for the high-level

synthesis of designs which are intended to be implemented on reconfigurable

architectures and programmable structures such as FPGAs. In the present work

this existing SAST is enhanced by incorporating some more phases and

interfaces for verification.

10

2.1 Target Architecture Structured Architecture Synthesis Tool (SAST) essentially takes the behavioral

description of an input design in the form of three address code, and outputs the

synthesizable RTL Verilog code. The generated data path is organized as

architectural blocks (A-block). Each A-block has a local functional unit (FU), local

storage and local buses (also called as access links). All the A-blocks in a design

are interconnected by a number of global buses. Other than the local memories in

all A-blocks, SAST also permits the use of global memories as architectural

components. These memories are similar to an A-block, except that it does not

contain any functional unit in it. These memories can be accessed globally by all

the A-blocks. These external memories are connected to all A-blocks by global

buses. The global memory units in the structured architecture play an important

role as a convenient interface for the system. While it may be difficult to initialize

a specific storage location within an A-block, it is considerably easier to store

initial operands and retrieve final results from the global memory units. Global

memories help improve the availability of operands and relieve the storage

requirement in individual A-blocks. All the data path components are of the

same width. That is, the local buses, storage units, functional units in the A-

blocks and the global buses have same width.

There are input/output ports that are connected to global buses, so that all

the A-blocks can access any of the ports. Each A-block has local memory as

register bank, which are connected to global buses through internal buses (access

links). And each A-block has one functional unit (FU), which takes input from

either local memory, or from internal buses. The output from the functional unit

sends back either to the register bank or to internal buses. Switches are there in

the design to enable/disable the connection between any two components in the

A-block. The group of the switches, which connects the internal buses and the

output of FU to the input ports of registers, are called as in-switches. The group of

11

switches, which connects the output of registers and output of FU to internal

buses, are called as out-switches. Global buses are connected to input ports of FU

through internal buses and in-switches. Output port of the FU is connected to the

global buses through internal buses and out-switches. The schematic diagram of

an A-block is shown in the Figure 2.1.

-- Hard connection -- Switch

Figure 2.1 Architectural Block

The structure of the data path is characterized by a set of architectural constraints

like the number of A-blocks, the number of global memories, the number of

global buses interconnecting the A-blocks, the number of access links or access

width connecting an A-block to the global buses and the maximum number of

writes per time step to storage locations in an A-block. The architectural

12

parameters which are internal to an A-block (e.g. number of accesslinks and

number of write ports to internal memory, etc.) are same to each A-block. These

structured data paths avoid random interconnects between data path elements.

Each A-block has a simple implementation. This makes the generated design

easy to implement on programmable devices such as FPGAs.

2.2 Features of SAST Reduction in Interconnection Cost: There are many high level synthesis tools

currently available in the market. But all of the present tools try to produce the

optimal RTL with the random interconnections among the data path components

(e.g. Muxes, ALUs, etc.), which raise the interconnection cost while fabricating

the design. Field programmable gate arrays are naturally attractive for

prototyping the designs generated by high level synthesis. Programmable

devices tend to have limited wiring resources between the data path elements so

the designs implementing on such devices required avoiding the long-distance

interconnections. We used a structured architecture for HLS which produces the

predictable interconnections among the data path components. This causes low

interconnection cost in the design.

Scheduling: SAST uses genetic algorithm based scheduler for scheduling the input

design. SAST supports both time constrained and resource constrained

scheduling. Resource information and the maximum no .of control steps the

scheduler can take to schedule for each basic block are provided to the scheduler

as input constraints. It also handles multi-cycle and pipelined functional units.

Register allocation and binding: After scheduling is completed, the next step is live-

variable analysis and register allocation for each A-block. SAST uses minimum

number of registers to store the intermediate values in the design.

13

Data path Generation: Data path for each A-block consists of functional unit, a

register bank and access links, which connects A-blocks to the global buses. SAST

uses structured architecture (SA) in the datapath, which reduces the

interconnection length between the data path components. SAST uses minimum

number of buses to schedule the input design.

RTL Generation: The final output from SAST is the RTL description in Verilog. It

generates the synthesizable verilog code for both the data path and the control

path.

Verification: Our objective is to verify the synthesis results at each step of HLS.

Results at each step of HLS for SAST are reported in a finite state machine with

data path (FSMD) model. Reports are generated after scheduling, allocation and

binding steps, for verification of the synthesis results.

Summary In this chapter we have explained the final target architecture used by the SAST.

We have described the importance of global memories used in the design. We

have also explained the architecture of an architectural block with functional

units, storage units and interconnection between them. Finally, we presented the

features of SAST.

14

Chapter 3

Interfaces for verification HLS can be seen as stepwise transformation of behavioral specification into a

structural implementation .Our objective is to verify the synthesis results at each

step of HLS. The intermediate results produced after each step is modeled by an

automaton. Each transformed automaton can be represented as a FSMD. Reports

are generated at each phase of the synthesis process for verification. The FSMDs

can be of four types namely behavior based, scheduled CDFG, one after register

allocation and binding and the last one after controller generation phase. Section

3.1 explains the modeling of the reports generated to verify the synthesis results.

Section 3.2 describes about FSMD generation from behavioral CDFG extracted

from VHDL specification. Section 3.3 describes about scheduling results and

their FSMD generation using model in 3.2. Section 3.4 discuss about FSMD

formation after register allocation and binding phase.

3.1 Finite State Machine with Data path (FSMD) An FSMD (finite state machine with data-path) is a universal specification model,

proposed by Gajski that can represent all hardware designs. An FSM (finite state

machine) model works well for up to several hundred states. Beyond that, the

model becomes incomprehensible to human designers. To adapt the FSM model

for more complex designs, FSMD was introduced by Gajski. Each storage

elements like the register is replaced by a variable in the FSMD. So, each variable

replaces thousands of different states. For example, a 16-bit register represents

15

216 different states in an FSM; thus, introduction of a 16-bit variable reduces the

number of states in the FSM model by 216. The use of variables leads to the

concept of an FSM with a datapath (FSMD).

The model is used in the present work with the addition of a reset state, for

encoding the specification and implementation of the circuit to be verified. This

reset state is also called the start state of the FSMD.

Definition: The FSMD is defined as an ordered tuple < Q, q0, I, V, O, f, h>,

Where

1. Q = { q0, q1 , q2, . . . qn } is the finite set of control states,

2. q0 Є Q is the reset state,

3. I is the set of primary input signals and ΣI is the input alphabet,

4. V is the set of storage variables and Σ is the set of all data storage states

or simply, data states,

5. O is the set of primary output signals and Σo is the output alphabet,

6. f : Q Χ 2S Q,, is the state transition function and

7.h: Q Χ 2S U,is the update function of the output and the storage

variables, where U and S are as defined below.

(a) U = {x e | x Є OUV and e Є E} represents a set of storage or output

assignments, where , E = {g(x,y,z,…)/x,y,z,……Є I U V} represents a set of

arithmetic expressions over the set I U V of input and storage variables.

(b) S = {R(e)|e Є E and R is any arithmetic relation} represents a set of

status expressions over I U V ,R Є (= 0 ,≠ 0 , > 0 , ≥ 0 , < 0 , ≤ 0). Thus, the next (control and data) state and output depend not only on the

present state and the input signals but also on the conjunctions (internal) status

16

expressions that indicate whether a predicate holds on the data state of the

storage and the input variables. Since, state transitions and updates have been

represented as functions; an FSMD model is inherently deterministic.

3.2 FSMD formation from behavioral CDFG

This the initial FSMD formed from the CDFG extracted from the

behavioral VHDL code. This FSMD has a one-to-one mapping with the extracted

CDFG. This FSMD resembles the data and control flow compatible to the

extracted CDFG from the behavioral specification. Listed below are the steps

involved in forming this initial FSMD in form of pseudo-code.

3.2.1 Methodology: formFSMDbeh( CDFGbeh )

While ( CDFGbeh)

do

if the currBLK is a BASIC block

do

construct a DFG o f the instructions within the block

create new states in FSMD corresponding

…to nodes of the DFG

update the transition data o f the FSMD

…representing data - flow

end do

if the currBLK is a CONTROL block

do

create a new state and the outward links

update the transition data of the

17

…FSMD representing control f low

end do

DFS( CDFGbeh)

end do

In the pseudo-code, CDFGbeh refers to the extracted CDFG from VHDL code.

Whenever a basic block in encountered in the CDFG, a data-flow graph (DFG) is

formed of the 3-addr instructions of the basic block. The edges here represent

purely control flow. Figure 3.2 shows FSMD formed from extracted CDFG for

greatest common divisor (GCD) benchmark. The transition edges show Rα/rα,

where Rα is the control information and rα is the data updates taking place in

transition.

Figure 3.1 shows the extracted CDFG from the VHDL code for greatest

common divisor (GCD) benchmark.

18

y1 ==y2

y1 ==y2

y1= y1 – y2

y1= y1 – y2

read(p0,y1) read(p1,y2)

read(p0,y1) read(p1,y2)

Figure 3.1 Extracted CDFG of GCD

19

q00

q01

q03 q02

q0e q05 q04

- / y1= p0, y2=p1

! y1= = y2 /- y1= = y2 /-

! y1 > y2 y1 > y2 0

Figure 3.2 FSMD formed from extracted CDFG

20

3.3 FSMD formation after scheduling The scheduling phase of HLS may result in movement of operations into

different control steps as per some heuristic followed. Thus, scheduling groups

operations into a set that can be represented by a state in the FSMD of the

scheduled CDFG. The FSMD formed after scheduling in SAST depends on two

things. First is the control-flow information between the blocks which can be

inherited from the initial CDFG extracted from VHDL. Second is the control

steps assigned to operations within the block after scheduling. FSMD after

scheduling is direct-mapping of these two informations.

A basic block i in the scheduled CDFG is scheduled in control steps steps.

Each operation j in the basic block i, Oij is given as,

f step : Oij → [n1 , n2]: n1 ≤ n2 , 1≤ n1 ≤steps , 1≤ n2 ≤steps

f step is a function which maps each operation Oij to a range of control steps [n1 , n2] ,where - n1 is the start time of operation Oij , and - n2 is the end time of operation Oij.

n1, n2 are the control steps relative to the beginning of the basic block i .

3.3.1 Methodology:

The steps involved in the FSMD formation post scheduling are mentioned in form of pseudo-code.

21

formFSMDschd( CFGfrmCDFG, Control - Steps )

do

for each block visited in CFGfrmCDFG

do

assign(hence create) a new state for each Control - Step

assigned to the block

update the transition information depicting

…the control and data flow

end do

end do

CFGfrmCDFG is the control information of the blocks extracted from the initial

CDFG which remains preserved and Control - steps is the scheduled information

for the design (it has separate information for each block) i.e. in each block which

operation is scheduled in a control step. In the method mentioned as each block

is traversed, the FSMD states are formed from the block's Control-Step

information. The FSMD formed after scheduling (GCD) is shown in figure.

22

-/p0=z

! y1 ==y2 && y1>y2 / y1 = y1-y2

y1= = y2/-

y1 = = y2/z=y1

q2

-/ y1 = p0, y2=p1 ! y1 ==y2 && ! y1>y2 / y2=y2-y1

! y1 ==y2 && y1 >y2/-

q0

q1

qe q4

q3

Figure 3.3 FSMD formed after scheduling Phase 3.4 FSMD formation after Register Allocation and binding

We have started formation of FSMD from register allocation and binding

results. In a typical HLS process after Allocation and Binding, data path of the

input specification is formed. Data path consists of three register transfer logic

(RTL) components: functional, storage and interconnections. The input variable

or signal gets mapped to a storage units in the data path namely register.

Functional units perform the operations assigned to it in each control step by its

23

composition. Data transfers between the functional and interconnection units in

the data path come from or go to storage units.

Moreover the same register can be used to represent different variables,

provided they are live at non-overlapping intervals. FSMD formed after register

allocation and binding is a mapping of register names to the variables and signal

names. FSMD formed at this phase is used for verification of correct register

sharing among the variables in the CDFG. FSMD formation at this stage is

forming a function which has as inputs the scheduled CDFG and the register

liveness information (intervals in which variables to certain register) or register

sharing information.

3.4.1 Methodology:

In SAST we have two types of variables involved in our data path namely

permanent and temporary. Permanent variables are one which binded

permanently to a register in a Alu-block for their entire lifetime. Temporary

variables on the other hand can be binded to more than one register in different

Alu-blocks in their lifetime, as they are required as operands to some functional

operation scheduled in other Alu-block. So firstly we extract the lifetimes of

variables with their span registers along with the control step in which they bind

to the register on a per block basis. Example of such extracted register

Lifetime information for GCD is: NV q-1 q-1 R1

y1 q0 q1 R0

y1 q-1 q2 R0

y1 q0 q3 R0

y1 q1 q3 R0

y1 q3 q4 R0

y1 q2 q5 R0

24

y1 q3 q5 R0

y2 q0 q1 R1

y2 q-1 q2 R1

y2 q0 q3 R1

y2 q1 q4 R1

y2 q2 q4 R1

y2 q4 q5 R1

z q5 q6 R1

After extracting the variable-register binding information, we traverse the

already formed scheduled FSMD and map the variables with their register

counterparts depending upon their states or control states. This forms the FSMD

with variables mapped to registers and the 2 FSMDs namely scheduled and this

one are used for verifying correct register sharing. The FSMD formed after

scheduling (GCD) is shown in Figure 3.1 and Figure 3.2 shows FSMD formed

after allocation and binding for GCD.

25

-/ r0 = p0, r1 = p1 ! r0 == r1 && ! r0> r1 / r1=r1—r0

! r0 ==r1 && r0>r1 / r0 = r0 – r1

! r0 ==r1 && r0 > r1/-

r0 = = r1/-

r0 = = r1/z=r0

-/p0=r1

q0

q1

q2 q3

qe q4

Figure 3.4 FSMD formed after allocation and binding phase

26

3.5 Implementation

This section gives details of data structures being used to represent the

FSMD and the modules involved in the process.

3.5.1 Data Structures:

To model the FSMD in a data structure, we need to have a list of states.

For each such states, the number of outgoing transitions along with the

condition, data transfers and the next state information on each outgoing

transition from each state. The following data structure models the necessary for

FSMD just as described.

This structure contains the fsmd STATE information struct fsmdSTATE { int st; int out; struct LINK *tran; struct STATE *next; } Structure fsmd STATE contains the information about each state in the FSMD.

The fields in the FSMD are explained below.

st: state number in the FSMD,

out: number of outgoing transitions for a state,

tran: pointer to the list of outgoing transitions from this state , and

next: pointer to next state in the list of states.

27

Struct LINK { char cond[100]; char act[100]; int st; struct LINK *next; } Structure LINK contains the outgoing transitions list for a state.

The fields in LINK are explained below.

cond : condition of execution of the outgoing transition ,

act : data transfer operations on the transition ,

st : next state information ,and

next : pointer to the next outgoing transition in the list .

3.5.2 Functional Modules:

Top level functional modules which achieve the construction of FSMD model

are as follows.

- Find_state: computes the starting state of each basic block in the scheduled

CDFG from the scheduled information by SAST. It also appends the

condition of execution to each basic block when control going to logical

successor blocks.

- BehReport: takes the CDFG extracted from VHDL source as input and

returns pointer to FSMD consisting of behavioral specifications.

- Schedule Report: takes scheduled CDFG as input and returns pointer to

FSMD consists of scheduling results in FSMD.

- RegAllocReport: takes scheduled CDFG, and register timespan as input and

returns FSMD consisting the register mapping to variable.

28

Chapter 4 Experimentation and Results:

4.1 results

The generation of FSMD after register allocation and binding phase was

implemented and tested successfully with various HLS benchmarks which are

both control intensive and data-intensive. The following benchmarks have been

used for experimentation. The input for this FSMD are scheduled CDFG, and

register timespans.

DIFFEQ: differential equation solver used by Gajski as well as Paulin et al

to illustrate their synthesis algorithms. It includes control constructs conditional

and looping statements.

GCD: computes greatest common divisor for the given two numbers X, Y

It includes control constructs involving conditional and looping statements.

DCT: direct cosine transform is a benchmark which is very data sensitive.

It has many data related operations involved.

The detailed results of the DIFFEQ are as follows.

29

Differential equation solver

Differential equation solver (DIFFEQ) is used to solve a system of first order

differential equations. This is a benchmark problem in high-level synthesis.

Its algorithmic description consists of conditional expressions and loops. Its

Derived CDFG representation from the algorithmic description is depicted in

the following Figure.

I

B1

C1

B2

T F

Figure 4.1: Control data flow graph (CDFG) of the DIFFEQ example

30

VHDL behavioral specification of DIFFEQ: entity diffeq is

port (dx, u, y, x, a: in bit;

x1, u1, y1: out bit);

end diffeq;

architecture behav of diffeq is

begin

process(x, y, u)

variable v0, v1, v2, v3, v4, v5, v6 : int ;

begin

while( x < a )

loop

v0 := u*dx ;

v1 := 3*x ;

x := x+dx ;

v2 := v0*v1 ;

v3 := 3*y ;

v4 := u-v2 ;

v5 := dx*v3 ;

v6 := u*dx ;

u := v4-v5 ;

y := y+v6 ;

end loop;

x1 <= x;

u1 <= u;

y1 <= y;

end process;

end behav;

The CDFG extracted by the translation scheme is as follows. It has 4 basic blocks.

4

B0 5

read (p0 , dx )

read (p1 , u )

read (p2 , y )

31

read (p3 , x )

read (p4 , a )

C0 1

x < a

B1 10

v0 = u * dx

v1 = 3 * x

x = x + dx

v2 = v0 * v1

v3 = 3 * y

v4 = u - v2

v5 = dx * v3

v6 = u * dx

u = v4 - v5

y = y + v6

B2 6

x1 = x

write (p2 , x1 )

u1 = u

write (p1 , u1 )

y1 = y

write (p0 , y1 )

4

B0 1 C0

C0 2 0 B1 1 B2

B1 1 C0

B2 0

FSMD formed from CDFG: The FSMD after this stage is direct mapping of the

CDFG extracted to a FSMD.

diffeq

q00 1 - / dx = p0, u = p1, y = p2, x = p3, a = p4 q01

q01 2 !x < a / - q02 x < a / - q03

32

q02 1 - / v0 = u * dx , v1 = 3 * x , x = x + dx , v3 = 3 * y , v6 = u * dx , q04

q04 1 - / v2 = v0 * v1 , v5 = dx * v3 , y = y + v6 , q05

q05 1 - / v4 = u - v2 , q06

q06 1 - / u = v4 - v5 , q01

q03 1 - / x1 = x , u1 = u , y1 = y , q07

q07 1 - / p2 = x1 , p1 = u1 , p0 = y1 , q08

q08 0

FSMD formed after scheduling: DIFFEQ is scheduled over 3 A-blocks with 2

global buses and 1 access link (access width) per each A-block. Different types of

operations present in the DIFFEQ are assignment, addition, condition, loop and

multiplication. Assignment and addition operations take single control step to

execute. Scheduled FSMD output of DIFFEQ is shown below.

q0 1 - / dx = p0, y = p2 q1

q1 1 - / x = p3, a = p4 q2

q2 1 - / u = p1 q3

q3 2 !x < a / v0 = u * dx,v1 = 3 * x,v3 = 3 * y q4 x < a / - q9

q4 1 - / - q5

q5 1 - / v2 = v0 * v1,v5 = dx * v3,v6 = u * dx q6

q6 1 - / - q7

q7 1 - / v4 = u - v2,y = y + v6 q8

q8 1 - / x = x + dx,u = v4 - v5 q3

q9 1 x < a / x1 = x,y1 = y q10

q10 1 - / p2 = x1,u1 = u q11

q11 1 - / p1 = u1, p0 = y1 q12

q12 0

DIFFEQ is scheduled in 13 control steps.

FSMD formed after register allocation and binding: All the variables in the

DIFFEQ are a mixture of temporary variables and program variable . The FSMD

extracted after register allocation and binding is

33

q0 1 - / r0 = p0, r3 = p2 q1

q1 1 - / r10 = p3, r4 = p4 q2

q2 1 - / r5 = p1 q3

q3 2 !r10 < r4 / r1 = r1 * r8,r2 = 3 * r10,r7 = 3 * r12 q4 r10 < r4 / - q9

q4 1 - / - q5

q5 1 - / r9 = r1 * r2,r7 = r8 * r7,r12 = r12 * r8 q6

q6 1 - / - q7

q7 1 - / r9 = r5 - r9,r3 = r12 + r12 q8

q8 1 - / r10 = r10 + r13,r5 = r9 - r7 q3

q9 1 r10 < r4 / r1 = r10 ,r9 = r3 q10

q10 1 - / p2 = r1,r12 = r5 q11

q11 1 - / p1 = r12, p0 = r9 q12

q12 0

Variable mapping to registers in all A-blocks is shown below.

-------- A-Block 0 --------

Number of registers 3

r_dx

r_u_v0_x1

r_v1

-------- A-Block 1 --------


r_y

r_a

r_u

r_3

r_v3_v5

r_dx

r_v2_v4_y1

-------- A-Block 2 --------


r_x

r_3

r_u_v6_y_u1

r_dx

34

4.2 Conclusions: These works is concerned with the development of a HLS tool, for

synthesizing structured architectures with a simple and predictable layout

structure and generate synthesizable RTL codes from VHDL behavioral

specification.

In this work we presented FSMD formation methodologies for phase-wise

verification of results which are used to translate the CDFG information used by

HLS into FSMD for verification. These methodologies work on different phases

of HLS and interface the HLS tool with a paralley developed verification tool.

We completed the generation of FSMD after register allocation and binding.

4.3 Future work SAST takes the VHDL behavioral description and produces RTL description

in synthesizable Verilog. To make this more effective and efficient following

enhancements can be done.

• Register Interconnection Optimization reduces the number of inter-

connection switches by using an optimization method to map variables to

registers. It also optimizes control signal count.

• Compiler optimizations like common sub expression elimination, constant

propagation and tree balancing etc can be included in the translation

methodology or preprocessor.

35

Bibliography [1] Daniel D.Gajski, Nikil D.Dutt , Allen C-H Wu, and Steve Y-L Lin , High level synthesis : Introduction to chip and System Design ,Kluwer Academic Publishers , 1992. [2] C.R.Mandal ,P.P.Chakrabarti , and S.Ghose , Gabind : a ga approach to allocation and binding for the high-level synthesis of data paths, IEEE Transactions on Very Large Scale Integration (VLSI) Systems , vol. 8 ,no. 6 ,pp 747-750 ,2000. [3] C.R Mandal, R.M. Zimmer, A Genetic Algorithm for Synthesis of Structured Data Paths, Proceedings of the 13th International Conference on VLSI Design, 2006 [4] C.R.Mandal , P.P. Chakrabarti , and S.Ghose , Allocation and binding for data path synthesis using a genetic approach , in proceedings of VLSI design ’96, pp.122-125 ,1996. [5] Ramachandan , N.Gajski , D.D Chaiyakul , An algorithm for array variable clustering , in proceedings of EUROASIC , The European Event in ASIC Design on European Design and Test Conference ,1994 . [6] M. Rahmouni and A. A. Jerraya, “Formulation and evaluation of scheduling techniques for control flow graphs”, in Proceedings of EuroDAC'95, (Brighton), pp. 386.391, 18-22 September 1995. [7] C.Tseng and D.P Siewiorek, FACET: A procedure for the Automated Synthesis of Digital Systems, 20th Design Automation Conference, 1983. [8] Holmes, N.D. Gajski , D.D Architectural exploration for data paths with memory hierarchy , in Proceedings of ED & TC on European Design and Test Conference,1995. [9] Herman Schmit, Donald E. Thomas, Synthesis of application-specific memory designs, in proceedings of IEEE Transactions on VLSI Systems, 1997.

36

[10] Peeter Ellervee ,Ahmed Hemani , Bengt Sventesson , High level Synthesis of Control and Memory Intensive Applications , in Proceedings of IEEE International Conference 1995. [11] D. Gajski and L. Ramachandran, “Introduction to high-level synthesis,” IEEE transactions on Design and Test of Computers, pp. 44–54, 1994. [12] N.-S. Woo, “A global, dynamic register allocation and binding for data path synthesis system,” in Procs. of 27th DAC, pp. 505–510, 1990. [13] C. Blank, “Formal verification of register binding,” in Procs. of Workshop on Advances in Verification (WAVE) 2000, 2000. [14] C. Karfa, C. Mandal, D. Sarkar, S. Pentakota, and C. Reade, “A formal verification method of scheduling in high-level synthesis,” in In Proc. ISQED ’06, pp. 71–78, March 2006.

37

extracting finite state machine with datapath models …pabitra/facad/06cs6006t.pdf · extracting...

Documents