system generator

23
An introduction to Xilinx System Generator Miroslav Kneˇ zevi´ c [email protected] ESAT/SCD-COSIC, Room 01.62 Reviewed by: Pieter Nuyts, Tom Redant and Nele Reynders

Upload: juan-carlos-uribe

Post on 12-Apr-2015

55 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: System Generator

An introduction to Xilinx System Generator

Miroslav [email protected]/SCD-COSIC, Room 01.62

Reviewed by: Pieter Nuyts, Tom Redant and Nele Reynders

Page 2: System Generator

1 Before We Start

Before we start with the introductory session, download the following files fromToledo, under the Cursus Informatie, section Introductiesessies / Fase3 - System Generator, and place them on your Desktop:

• sysgen intro.pdf - This file.

• Files Sysgen Impact USB.zip - Contains files necessary for testing theFPGA board.

• Dragon FPGA programming and testing.pdf - A document describing theFPGA testing.

In order to have the correct versions of Matlab and Impact running, youneed to perform the initial set up by following the next steps (these steps needto be done only ONCE. Next time you log into an ESAT machine you can skipthem):

(i) Open the terminal and first backup your old .bashrc file by typingcp .bashrc .bashrc_old

(ii) Then for 64-bit machines type

echo -e "$(cat .bashrc)\nsource ~micasusr/design/scripts/xilinx_ise_12.2_64bit.rc" > .bashrc

and for 32-bit machines type

echo -e "$(cat .bashrc)\nsource ~micasusr/design/scripts/xilinx_ise_12.2_32bit.rc" > .bashrc

(iii) Finally, typesource .bashrc

Now, you are ready to lunch the System Generator and Xilinx Impact toolsby typing sysgen and impact in your terminal, respectively.

2 Introduction

Xilinx System Generator is the industry’s leading high-level tool for design-ing high-performance Digital Signal Processing (DSP) systems using FPGAs.A close interconnection with the MATLAB/Simulink software makes the im-plementation of complex hardware designs an easy task for the engineers. Theprevious experience with Xilinx FPGAs or RTL design methodologies are not re-quired when using System Generator. Designs are captured in the DSP friendlySimulink modeling environment using a Xilinx specific blockset. All of the down-stream FPGA implementation steps including synthesis and place and route areautomatically performed to generate an FPGA programming file.

The purpose of this exercise session is to get students acquainted with theXilinx System Generator tool that will be further used for the Dragon project(H01Q6a). After reading the following sections, the students should be able tounderstand basics about the System Generator and successfully complete thethree exercises given in Section 4.

During the preparation of this document, the author found useful informa-tion in [1] and [2].

1

Page 3: System Generator

3 Design Creation Basics

Over 90 DSP building blocks are provided in the Xilinx DSP blockset forSimulink. These blocks include common DSP building blocks such as adders,multipliers, multiplexers, registers, and others, as shown in Figure 1. Also in-cluded is a set of complex DSP building blocks such as forward error correctionblocks, FFTs, filters and memories. These blocks leverage the Xilinx IP coregenerators to deliver optimized results for the selected device.

Figure 1: The Xilinx DSP Block Set.

The Xilinx DSP blockset is accessed via the Simulink Library browser whichcan be launched from the standard MATLAB toolbar (Figure 2). The blocksare separated into sub-categories for easier searching. One sub-category, Indexincludes all the blocks and is often the quickest way to access a block you arealready familiar with.

Figure 2: Launching Simulink.

2

Page 4: System Generator

3.1 Defining the FPGA boundary

System Generator works with standard Simulink models. Two blocks namedGateway In and Gateway Out define the boundary of the FPGA from theSimulink simulation model. Double-click on the blocks brings up the propertieseditor where the block properties can be fully specified (Figure 3).

Figure 3: Gateway In and Gateway Out blocks.

The Gateway In block converts inputs of type Simulink integer, double andfixed-point to Xilinx fixed-point number. The Xilinx fixed-point types are

• Boolean.

• Signed (two’s complement).

• Unsigned.

If the chosen type is Signed or Unsigned, the Number of bits along with theBinary point need to be specified. Number of bits represents the input width,while the Binary point parameter indicates the number of bits to the right ofthe binary point (i.e. the size of the fraction). The Binary point position mustbe between zero and the specified Number of bits.

While converting a Simulink type to a System Generator fixed-point type,the Gateway In uses the selected quantization and overflow options. For quan-tization, the options are

• Round – round to the nearest representable value (or to the value furthestfrom zero if there are two equidistant nearest representable values).

• Truncate – discard bits to the right of the least significant representablebit.

3

Page 5: System Generator

For overflow, the options are

• Wrap – discard bits to the left of the most significant representable bit.

• Saturate – saturate to the largest positive/smallest negative value.

• Flag as error – flag an overflow as a Simulink error during simulation.

It is important to realize that overflow and quantization for the Gateway In

blocks do not take place in hardware - they take place in the block softwareitself, before entering the hardware phase. In hardware the Gateway In blocksbecome top level input ports.

The Gateway Out block converts Xilinx fixed-point inputs into outputs oftype Simulink integer, double or fixed-point. In hardware these blocks becometop level output ports or are discarded, depending on how they are configured.

3.2 Adding the System Generator Token

Every System Generator diagram requires that at least one System Generatortoken is placed on the diagram. This block is not connected to anything butserves to drive the FPGA implementation process. The property editor for thisblock allows specification of the target netlist, device, performance targets andsystem period (Figure 4). System Generator will issue an error if this block isabsent.

Figure 4: System Generator Token.

Some of the parameters specific to the System Generator block are as fol-lows (the best is to leave them unchanged):

• Compilation – Specifies the type of compilation result that should be pro-duced when the code generator is invoked. Default settings: Bitstream.

4

Page 6: System Generator

• Part – Defines the FPGA part to be used. Default settings: Spartan3Exc3s250e-4tq144.

• Target directory – Defines where the System Generator stores the compi-lation results. The important thing is that the target directory is placedon the PC’s local hard drive and not on the network, since compiling toa network drive makes the compiler run very slow. Once the file is com-piled, you can safely move it to your home directory on the network drive.Default settings: /tmp/netlist.

• Synthesis tool – Specifies the tool to be used to synthesize the design. Thepossibilities are Synplicity’s Synplify Pro and Synplify, and Xilinx’s XST.Default settings: XST.

• Hardware Description Language – Specifies the HDL language to be usedfor compilation of the design. The possibilities are VHDL and Verilog.Default settings: VHDL.

• FPGA clock period (ns) – Defines the period in nanoseconds of the hard-ware clock. The period is passed to the Xilinx implementation toolsthrough a constraints file, where it is used as the global PERIOD con-straint. Default settings: 50.

• Clock pin location – Specifies a location of the clock pin on an FPGA.Default settings: P125.

• Block icon display – Specifies the type of information to be displayed onthe block icon. The block icon is updated with the selected display optionafter the design has been compiled. Default settings: Default.

• Simulink system period – Important: The system period is NOT givenin seconds but in units of the ”FPGA clock period”! If both the FPGAand the Simulink frequency need to be 20 MHz, then ”FPGA clock period(ns)” should be set to 50, and ”Simulink system period” to 1. If, forexample, the Simulink frequency need to be only 10 MHz then ”Simulinksystem period” should be set to 2. Default settings: 1.

3.3 Creating the DSP Design

Once the FPGA boundaries have been established using the Gateway blocks,the DSP design can be constructed using blocks from the Xilinx DSP block-set. Standard Simulink blocks are not supported for use within the Gateway

In/Gateway Out blocks. You will find a rich set of filters, FFTs, FEC cores,memories, arithmetic, logical and bitwise blocks available for use in constructingDSP designs. Each of these blocks are cycle and bit accurate.

Once the design is completed, the hardware implementation files can begenerated using the Generate button available on the System Generator tokenproperties editor. One option is to select HDL Netlist which allows the FPGAimplementation steps of RTL synthesis and place and route to be performedinteractively using tool specific user interfaces. Alternatively, you can selectBitstream as the Compilation target and System Generator will automaticallyperform all implementation steps.

5

Page 7: System Generator

3.4 Creating Input Vectors using MATLAB

Simulink is built on top of MATLAB allowing the use of the full MATLABlanguage for input signal generation and output analysis. You can use theFrom Workspace and To Workspace blocks from the Simulink Source and Sink

libraries. Input values must be specified as an n row × 2 column matrix wherethe first column is the simulation time and the second column includes theinput values. This is a very popular way of generating input vectors for SystemGenerator designs (Figure 5).

Figure 5: Creating Input Vectors using MATLAB.

3.5 MCode Block

One of the blocks that deserves a special introduction is an MCode block. It isa container for executing a user-supplied MATLAB function within Simulink.The block executes an M-code to calculate block outputs during a Simulink sim-ulation. The block’s Simulink interface is derived from the MATLAB functiondescription and from the block mask parameters. There is one input port foreach parameter of the function and one output port for each value the functionreturns. Port names and ordering correspond to the names and ordering ofparameters and return values.

The MCode block supports a limited subset of the MATLAB language that isuseful for implementing arithmetic functions, finite state machines and controllogic. It has the following three primary coding guidelines that must be followed:

• All block inputs and outputs must be of Xilinx fixed-point type.

• The block must have at least one output port.

• The code for the block must exist on the MATLAB path or in the samedirectory where the model file that uses that block is.

6

Page 8: System Generator

To illustrate the functionality of the MCode block, we show a simple examplethat performs z = max(x, y) function. The file xlmax.m contains function xlmax,given as:

function z = xlmax(x, y)if x > y

z = x;elsez = y;end

An MCode block based on the function xlmax will have input ports x and y andoutput port z. Figure 6 shows how to set up an MCode block to use the functionxlmax.

Figure 6: MCode Block Properties.

Some of the MATLAB language constructs that MCode block supports are:

• Assignment statements.

• Simple and compound if/else/elseif statements.

• switch statements.

• Arithmetic expressions involving only addition and subtraction.

• Addition, Subtraction, Multiplication, Division by power of two.

• Relational operators.

• Logical operators.

For the rest of the MATLAB constructs/functions that can be used in MCode

file please refer to the Xilinx System Generator help documentation.To further illustrate functionality of the MCode block we give an example for

constructing a simple Finite State Machine (FSM) that detects the pattern 1101

7

Page 9: System Generator

0

1

2

3

0/0

1/0

1/0

1/0

0/0

1/1seen_1

seen_11

seen_110

seen_none

0/0 0/0

Figure 7: FSM for detecting 1101 pattern.

in an input stream of bits. Figure 7 shows a behavioral function of the FSM.The M-function that is used by the MCode block contains a transition function,which computes the next state based on the current state and the current input.The M-function in this example defines persistent state variables to store thestate of the finite state machine in the MCode block. The following M-code,which defines function detect 1101 is contained in file fsm.m:

% This FSM detects the 1101 sequence% Bits are loaded in a serial manner

function matched = detect 1101 (d in)

seen none = 0;seen 1 = 1;seen 11 = 2;seen 110 = 3;

% the state is a 2−bit registerpersistent state, state = xl state(seen none, {xlUnsigned, 2, 0});

matched = 0;

switch statecase seen none

if d in == 1state = seen 1;

elsestate = seen none;

endcase seen 1

if d in == 1

8

Page 10: System Generator

state = seen 11;else

state = seen none;end

case seen 11if d in == 1

state = seen 11;else

state = seen 110;end

case seen 110if d in == 1

state = seen 1;matched = 1;

elsestate = seen none;

endotherwise

state = seen none;endend

The previous M-code has an internal state variable that holds its value fromone simulation step to the next. A state variable is declared with the MATLABkeyword persistent and must be initially assigned with an xl state functioncall. The state variable is declared as persistent, and the first assignmentto state is the result of the xl state invocation. The xl state function takestwo arguments. The first is the initial value and must be a constant. The secondis the precision of the state variable.

In our example, the line persistent state, state = xl state(seen none,

xlUnsigned, 2, 0) defines a variable state as a 2-bit unsigned variable (reg-ister) with the binary point at the position 0 and initializes it with the valueseen none. Figure 8 shows the complete solution to the previous example.

Note that, since we have 4 states, 2 bits are sufficient for the state variable.There is no need in allocating more bits. However, supposing we had 5 states, wewould have to allocate 3 bits giving 8 possible states. In other words, we wouldhave 3 unused states. Even though they are unused, it is very importantto include them in the switch-case block: Suppose you only define states0 to 4 and after the power-up, the FPGA starts in state 6. The FSM willalways remain there since the transition from state 6 is not defined in this case.Therefore, you must also define states 5 to 7. A logical implementation wouldbe to switch to a well-defined state, for example the starting state. Insteadof adding a separate case block for every unused state, you can also use theotherwise block. This will make sure that we do not forget any of the unusedstates. In the above code example, the otherwise block has also been added,even though with four states, there are no unused states. Nevertheless it isa good practice to always include one, since it avoids errors when adding orremoving states later on.

A more detailed explanation of the MCode block is given in the Xilinx SystemGenerator help documentation.

3.6 Reinterpret, Convert, Concat, Slice and BitBasher blocks

Besides an MCode block, we would like to give a closer overview of few moreblocks that will be useful during the Dragon project. Since some of their

9

Page 11: System Generator

Figure 8: Finite State Machine for detecting 1101 pattern.

features are not very obvious at the first glance, we will outline their propertiesand try to avoid any ambiguity.

Figure 9: Reinterpret, Convert, Concat, Slice and BitBasher blocks.

3.6.1 Reinterpret block

As its name states, the Reinterpret block forces the bits of an input signal toa new type without any regard for the numerical value or location of the binarypoint. It basically reinterprets the data type of the input signal. The blockallows for unsigned data to be reinterpreted as signed data and vice versa. It alsoallows for the reinterpretation of the data’s scaling, through the repositioningof the binary point within the data. It is important to note that this block doesnot change the number of input bits (the number of bits at the output willalways be the same as the number of bits at the input). In hardware, this blockdoes not consume any resources.

An example of this block’s use is as follows: if the input type is 6 bits

10

Page 12: System Generator

wide and signed, with 2 fractional bits and the output type is forced to beunsigned with 0 fractional bits, then an input of -1.5 (1110.10 in binary, two’scomplement) would be translated into an output of 58 (111010 in binary).

The block parameters are:

• Force Arithmetic Type – When checked, the output type will be forcedto the arithmetic type chosen according to the setting of the OutputArithmetic Type parameter. When unchecked, the arithmetic type ofthe output will be unchanged from the arithmetic type of the input.

• Force Binary Point – When checked, the binary point position of theoutput will be forced to the position supplied in the Output BinaryPoint parameter. When unchecked, the arithmetic type of the outputwill be unchanged from the arithmetic type of the input.

3.6.2 Convert block

The Convert block converts each input signal to a value of a desired arithmetictype. For example, a number can be converted to a signed (two’s complement)or unsigned value. In contrast to the Reinterpret block, this block may changethe number of input bits and can convert any type of the input to any type ofthe output. In short, the block will try to preserve the input value if possible.

An example of this block’s use is as follows: if the input type is 6 bits wideand signed, with 2 fractional bits and the output type is forced to be signed, 6bits wide with 4 fractional bits, then an input 1110.10 (-1.5 in decimal) will betranslated into an output 10.1000 (again -1.5 in decimal).

We provide another example to illustrate that one has to be careful whenusing the Convert block: if the input type is again 6 bits wide and signed, with2 fractional bits and the output type is forced to be unsigned, 6 bits wide with 0fractional bits, then the same input 1110.10 (-1.5 in decimal) will be translatedinto an output of 111110 (62 in decimal). The input is first quantized (truncated)and set to 1110 as the output is specified to have 0 fractional bits. Then, thesign extension is applied to expand the input to 6 bits and the final result weget is 111110.

The block parameters are:

• Output Precision – Determines the arithmetic type of the output signal.You can choose Boolean, Unsigned or Signed (two’s complement) type. Ifthe Unsigned or Signed format is chosen, you can further specify the totalNumber of bits and the Binary point.

• Quantization – Quantization errors occur when the number of fractionalbits is insufficient to represent the fractional portion of a value. The op-tions are to Round to the nearest representable value (or to the value fur-thest from zero if there are two equidistant nearest representable values),or to Truncate (i.e. to discard bits to the right of the least significantrepresentable bit).

• Overflow – Overflow errors occur when a value lies outside the repre-sentable range. For overflow the options are to Saturate to the largestpositive/smallest negative value, to Wrap (i.e. to discard bits to the leftof the most significant representable bit), or to Flag as error (an overflow

11

Page 13: System Generator

as a Simulink error) during simulation. Flag as error is a simulation onlyfeature. The hardware generated is the same as when Wrap is selected.

In hardware, rounding and saturating require resources, truncating and wrap-ping do not.

3.6.3 Concat block

The Concat block concatenates n input ports (2 ≤ n ≤ 1024) into a singleoutput port at the bit level. The first and the last input ports are labeled hi

and lo, respectively. Input ports between these two ports are not labeled. Theinput to the hi port will occupy the most significant bits of the output and theinput to the lo port will occupy the least significant bits of the output. All theinputs need to be Unsigned type with the binary points at zero position. Thereis only one block parameter labeled as Number of inputs which specifies thenumber of input ports n.

The Reinterpret block provides signed-to-unsigned conversion capabilitiesthat can extend the functionality of the Concat block.

3.6.4 Slice block

The Slice block allows you to slice off a sequence of bits from your input dataand create a new data value. This value is presented as the output of the block.The output data type is unsigned with its binary point at zero. The blockprovides several mechanisms by which the sequence of bits can be specified.Parameters specific to the block are as follows:

• Width of slice (Number of bits) – Specifies the number of bits toextract.

• Boolean output – Tells whether single bit slices should be type Boolean.

• Specify range as – Allows you to specify either the bit locations of bothend-points of the slice (Two bit locations) or one end-point along withnumber of bits to be taken in the slice (Upper bit location + widthor Lower bit location + width).

• Offset of top bit – Specifies the offset for the ending bit position fromthe LSB, MSB or binary point.

• Relative to – Specifies the bit slice position relative to the MSB, LSB,or Binary point of the top or the bottom of the slice.

• Offset of bottom bit – Specifies the offset for the ending bit positionfrom the LSB, MSB or binary point.

An example of this block’s use is as follows: If the input signal is 16 bitswide and signed with 13 fractional bits, then the following settings will alwaysresult in slicing off a sequence of bits as represented in Figure 10:

• Specify range as: Two bit locations; Offset of top bit: 11, Relative to:LSB of input; Offset of bottom bit: 7, Relative to: LSB of input.

12

Page 14: System Generator

• Specify range as: Two bit locations; Offset of top bit: -2, Relative to:Binary point of input; Offset of bottom bit: -6, Relative to: Binary pointof input.

• Specify range as: Two bit locations; Offset of top bit: -4, Relative to:MSB of input; Offset of bottom bit: -8, Relative to: MSB of input.

• Width of slice (number of bits): 5; Specify range as: Upper bit location+ width; Offset of top bit: 11, Relative to: LSB of input.

• Width of slice (number of bits): 5; Specify range as: Upper bit location+ width; Offset of top bit: -2, Relative to: Binary point of input.

• Width of slice (number of bits): 5; Specify range as: Upper bit location+ width; Offset of top bit: -4, Relative to: MSB of input.

• Width of slice (number of bits): 5; Specify range as: Lower bit location +width; Offset of bottom bit: 7, Relative to: LSB of input.

• Width of slice (number of bits): 5; Specify range as: Lower bit location +width; Offset of bottom bit: -6, Relative to: Binary point of input.

• Width of slice (number of bits): 5; Specify range as: Lower bit location +width; Offset of bottom bit: -8, Relative to: MSB of input.

p1 p0 p-1 p-2 p-3 p-4 p-5Sp p-7 p-8 p-9p-6 p-10 p-11 p-12 p-13

Sliced Bits

15 0711

Figure 10: An example for the Slice block.

3.6.5 BitBasher block

The BitBasher block performs slicing, concatenation and augmentation of in-puts attached to the block. The operation to be performed is described usingVerilog and the block may have up to four output ports. The number of outputports is equal to the number of expressions. The block does not cost anythingin hardware.

The block parameters dialog box can be invoked by double-clicking the iconin your Simulink model. Parameter specific to the Basic tab is BitBasherExpression that specifies a bitwise manipulation expression based on Verilogsyntax. Multiple expressions (limited to a maximum of 4) can be specified usingnew line as a separator between expressions. Parameters specific to the Outputtab are as follows:

• Output: Refers to the port on which the data type is specified.

• Output type: Arithmetic type to be forced onto the corresponding out-put.

13

Page 15: System Generator

• Binary Point: Binary point location to be forced onto the correspondingoutput.

For further help on the BitBasher block and Verilog syntax please refer tothe Xilinx System Generator help documentation.

4 Exercises

The first three exercises of this section are intended to help students in furtherunderstanding of areas such as fixed-point arithmetic, finite state machines andmulti-rate systems. The last exercise will be used to test the FPGA board. Allthe four exercises need to be completed using the Xilinx System Generator tool.

4.1 Fixed-Point Arithmetic

The most convenient way to represent a real number r is by the means offloating-point representation as it is shown in the following equation:

r = m× be .

Here, m is known as the mantissa, b is the base and e is the exponent. However,in majority of commercially available processors on the market today thereis no hardware support for floating-point arithmetic. This is due to the costof extra silicon that needs to be added for the Floating Point Unit (FPU).By implementing algorithms using fixed-point (integer) arithmetic, a significantimprovement in execution speed can be observed as most of the processorshave the efficient Arithmetic Logic Units (ALU) that support the fixed-pointarithmetic.

In the real world, most of the physical signals around us are usually rep-resented using the real number (floating-point) representation. On the otherhand, making an efficient algorithm that will process the mentioned signals im-plies the use of an efficient fixed-point arithmetic. Now, the question that comesnaturally is: ”How do we represent a floating-point (real) number by the meansof fixed-point arithmetic?”

What we need to make sure is that the signals are represented accuratelyand that we do not lose any significant information. The problem can be easilysolved by using large amounts of memory to store the required information withenough accuracy. This, of course, is not a good approach as the increased needfor the storage elements will slow down the whole computation process andincrease the hardware size. Hence, it is of a crucial importance to preciselyestimate what is the accuracy we really need when performing a fixed-pointarithmetic.

Figure 11 depicts a fixed-point representation of a real number r. To bemore specific, this is a signed two’s complement fixed-point representation. Themost significant bit S represents a sign bit and is equal to 0 for positive and1 for negative numbers, respectively. The whole number representation has wbits where I bits, including the sign bit, are used for the integer part and Fbits serve to represent the fractional part of the number. The value of r can beevaluated as:

r = −2I−1S +

I−2∑i=−F

2iri .

14

Page 16: System Generator

…rI-2

0

r-F+1r0r1…S r-1 r-F

w-1

2I-2 21 20 2-1 2-F+1 2-F

Integer Fraction

Word length

Binary pointSign bit

2I-1

Figure 11: Two’s complement fixed-point representation.

Next, we will learn how to evaluate the number of necessary bits to repre-sent a given real number r, |r| ≤ R with desired accuracy. First, we considerthe two’s complement fixed-point representation as shown in Figure 11. Let usassume that the number of bits used to represent the integer part of the num-ber (including the sign bit) is equal to I and the number of bits used for therepresentation of the fractional part is equal to F . To assure that the integerpart is large enough to store the value of r, the following equation needs to besatisfied:

−2I−1 ≤ Int(r) ≤ 2I−1 − 1 ,

where Int(r) represents the integer part of r. By solving the previous equationwe get the relation for size of the integer part:

I =⌊

log2 R⌋

+ 2 . (1)

Second, if the real number r is greater than or equal to 0 (0 ≤ r ≤ R), wecan use an unsigned number representation as it is shown in Figure 12. The

…rI-2

0

r-F+1r0r1… r-1 r-F

w-1

2I-2 21 20 2-1 2-F+1 2-F

Integer Fraction

Word length

Binary point

rI-1

2I-1

Figure 12: Unsigned fixed-point representation.

15

Page 17: System Generator

value of r can be evaluated as:

r =

I−1∑i=−F

2iri .

Again, to assure that the integer part is large enough, the following equationneeds to be satisfied:

0 ≤ Int(r) ≤ 2I − 1 .

Since for this case we don’t need to use an additional bit for the sign information,the solution of the previous equation is given as:

I =⌊

log2 R⌋

+ 1 . (2)

To evaluate the number of bits for the fractional part of the real number r,we need to define a resolution for a fixed-point variable. The resolution (denotedwith ε) is governed by the following equation:

ε =1

2F

where F is the number of bits required for a particular resolution. Therefore,the minimal number of bits for a fractional part is given by:

F =

⌈log2

(1

ε

)⌉. (3)

Equation 3 is used both for the two’s complement and the unsigned fixed-pointrepresentation.

Example 1 - Fixed-Point Arithmetic

Let us consider a simple example with two 8-bit variables −1 ≤ x < 1 and−2 ≤ y < 2 with as much resolution as possible. The first variable x is rangingfrom −1 to 0.9921875 and the second variable y is ranging from −2 to 1.984375.Therefore, for x we can write Ix = 1, Fx = 7 while for y it holds Iy = 2, Fy = 6where Ix (Iy) is the number of bits for integer and Fx (Fy) is the number of bitsof fractional part of variable x (y), respectively. Figure 13 further illustratesthe properties of variables x and y.

x-1

2-1

x-2

2-2

x-3

2-3

x-4

2-4

x-5

2-5

x-6

2-6

x-7

2-7

y0

20

y-1

2-1

y-2

2-2

y-3

2-3

y-4

2-4

y-5

2-5

y-6

2-6

Sy

21

Sx

20

x

y

Figure 13: Two’s complement fixed-point representation of x and y.

Next, we will learn how to add and multiply two signed fixed-point variables.An addition is a pure integer type of operation, but care must be taken to

16

Page 18: System Generator

align the binary points and attention must be paid to handling overflow of theaddition. To calculate z = x + y we first need to estimate the number of bitsfor the integer and the fractional part of z (Iz and Fz). Given the boundariesfor x and y we can evaluate that the variable z is ranging from −3 to 2.9765625(−3 ≤ z < 3). Following Equations 1 and 3 we can estimate Iz and Fz as:

Iz =⌊

log2 3⌋

+ 2 = 3 ,

Fz =

⌈log2

(1

ε

)⌉= 7 .

We choose ε such that we have as much resolution as possible. This basicallymeans that Fz is evaluated as Fz = max(Fx, Fy).

To avoid possible errors that can occur due to the overflow of the addition,we need to apply a sign extension to both variables x and y, and to align thebinary points by shifting the variables to the right. Figure 14 shows the variablesx and y after the mentioned transformations.

y0

20

y-1

2-1

y-2

2-2

y-3

2-3

y-4

2-4

y-5

2-5

y-6

2-6

Sy

21

y

Sx

20

x-1

2-1

x-2

2-2

x-3

2-3

x-4

2-4

x-5

2-5

x-6

2-6

Sx

21

x

z0

20

z-1

2-1

z-2

2-2

z-3

2-3

z-4

2-4

z-5

2-5

z-6

2-6

z1

21

z Sz

22

+

Sy

22

Sx

22

x-7

2-7

z-7

2-7

SignExtension

AlignedBinary Point

Figure 14: Fixed-point addition z = x + y.

As it is shown in Figure 14, the result z is represented using 10 bits and thatis more than we had for representing variables x and y (8 bits). Therefore, weneed to reduce the resolution of z by truncating the 2 least significant bits ofz (z−7 and z−6). This truncating will lower the accuracy of the result z andhence, z will range from −3 to 2.96875.

A fixed-point multiplication is simpler than the fixed-point addition. Whenperforming a fixed-point multiplication p = xy the product p is 2w bits longif the multiplier x and multiplicand y are w bits long. The integer and thefractional bits of the product are calculated in a very simple manner:

Ip = Ix + Iy

= 1 + 2 = 3 ,

Fp = Fx + Fy

= 7 + 6 = 13 .

The fixed-point multiplication is again a pure integer type of operation and itis depicted in Figure 15. Again, if there were only 8 bits for representing the

17

Page 19: System Generator

result, we would need to truncate the product by discarding 8 least significantbits (p−6 . . . p−13).

x-1

2-1

x-2

2-2

x-3

2-3

x-4

2-4

x-5

2-5

x-6

2-6

x-7

2-7

y0

20

y-1

2-1

y-2

2-2

y-3

2-3

y-4

2-4

y-5

2-5

y-6

2-6

Sy

21

Sx

20

x

y

p1

21

p0

20

p-1

2-1

p-2

2-2

p-3

2-3

p-4

2-4

p-5

2-5

Sp

22

p p-7

2-7

p-8

2-8

p-9

2-9 2-10 2-11 2-12 2-13

p-6

2-6

×

p-10 p-11 p-12 p-13

Figure 15: Fixed-point multiplication p = x× y.

4.1.1 Exercise 1 – Fixed-Point Arithmetic

Using Xilinx System Generator implement the Arithmetic Unit (AU) that willperform the following arithmetic operation:

A = B × C + D (4)

where the range limits of the inputs are given as |B| ≤ 1.8, |C| ≤ 1 and |D| ≤ 2.8.All inputs are represented as 8-bit signed two’s complement numbers. The wordsize of the AU is 8 bits. Each operand needs to be represented with as muchresolution as possible.

18

Page 20: System Generator

4.2 Finite State Machine

To give a theoretical definition we can say that a finite state machine (FSM) isa model of behavior composed of a finite number of states where the next stateis computed based on the current state and the current input. The finite statemachines are widely used for controlling the complex systems.

4.2.1 Exercise 2 – Finite State Machine

A goal of this exercise is to make a simple FSM that controls the traffic lights.The FSM has four states (RED, GREEN, ORANGE and IDLE) and a behav-ioral description as given in Figure 16. A default state of the traffic lights isIDLE and it changes to RED when the start signal arrives. If the reset signal isgenerated the state of the traffic lights has to be changed to IDLE. Both startand reset signals are provided as the impulse signals and are active on 1. Theother transitions between states are determined by the value of the counter.Duration of the RED state is 10 cycles, GREEN is 20 and ORANGE lasts for2 cycles. The FSM should be implemented as an MCode block with three inputs(rst, start, cntr) and two outputs (light, rst cntr). The counter should also beimplemented as a separate block.

start=0

start=1 ( )

cntr<TRED

cntr=TRED

cntr<TGREEN

cntr=TGREEN

cntr<TORANGE

cntr=TORANGERED

GREEN

ORANGE

IDLE

rst=1 ( )

rst=1 ( )

rst=1 ( )

Figure 16: The FSM of the traffic lights.

19

Page 21: System Generator

4.3 Multi-Rate Systems

A typical example of the multi-rate systems is a base-station receiver in a mobilephone network as shown in Figure 17. The tower has multiple antennas toprovide a full coverage of the area. The diagram shows that this results in tworeceiver channels. In each of these channels, there is some form of complexmixing, resulting in real and imaginary channels. Often DSP systems such asthis one, need to down sample the input signals prior to the digital filtering stepsperformed during equalization and demodulation. Doing so, one can simplifythe filter design and hardware significantly.

Figure 17: Base-station receiver in a mobile phone network.

4.3.1 Exercise 3 – Multi-Rate Systems

After completing this exercise, you will be able to change the sample rates in aDSP system, convert a serial stream of data into a parallel word and convert aparallel word into a serial stream of data.

Open a new Simulink diagram and create the simple diagram shown below.Use the Counter Limited block from the Simulink Sources library and setthe upper limit of the counter to 10. Set the quantization of the Gateway In

block to Fix 8 0. Simulate the counter for 10 simulation cycles and observe theresults.

Figure 18: Initial set-up.

As shown in Figure 19, add a Downsample block from the Xilinx Block-set Index library between the Gateway In and Gateway Out blocks, then re-simulate the design. What do you observe?

20

Page 22: System Generator

Figure 19: Down sampling.

Replace the Down Sample block with an Up Sample block and re-simulatethe design. The System Generator token is going to generate an error thatindicates your sample rate is incorrect. Why?

Double-click on the System Generator token and change the Simulink Sys-tem Period to 1/2. Re-simulate the design. Add Sample Time probes from theXilinx Blockset Index library before and after the Up Sample block and con-nect the outputs of the probes to the Simulink Sinks as shown in Figure 20.These probes do not add any hardware to the design, but offer a powerful de-bugging tool for complex multi-rate systems. Re-simulate the design to observethe sample rate in the Display sinks.

Figure 20: Up sampling.

In the next two steps, you will explore the rate changing effects of using theSerial to Parallel and Parallel to Serial blocks from the Xilinx Block-set. Open a new blank model, then create a design shown in Figure 21. Setthe limit on the Counter Limited block to 1. This will simply generate a se-quence 01010101010101 . . . Set the output of the Serial to Parallel block toUFix 8 0. Explain the diagrams you get on the scope. What happens if youchange the input of Serial to Parallel to be 1-bit (by changing the Gateway

In parameters)?Now, replace the Serial to Parallel block with the Parallel to Serial

block. Leave the output quantization at the default UFix 1 0 and change thesample rate in the System Generator token from 1 to 1/8. Re-simulate thedesign and record the input and output sample rates. Explain the diagrams onthe scope.

21

Page 23: System Generator

Figure 21: Serial-to-Parallel conversion.

4.4 FPGA Board Testing

A goal of this exercise is to learn basics about the FPGA board that will beused later, during the Dragon project. For that purpose we will use addi-tional documents that describe the FPGA board. The material can be found atToledo, under the Cursus Informatie, section Introductiesessies / Fase3 - System Generator. Two simple exercises will be demonstrated to you.First, we will check the functionality of the LEDs on board and second, theUSB connection will be tested.

5 Practical suggestions

When creating System Generator files for the Dragon project, start from the filetemplate.mdl, which can be found on Toledo. This file already defines all thecorrect settings for the System Generator token and makes some pin connectionsto avoid problems.

References

[1] Xilinx, Xilinx System Generator Help Documentation, http://www.xilinx.com/

[2] Erick L. Oberstar, Fixed-Point Representation and Fractional Math, Ober-star Consulting, 2007.

22