32-bit signed multiplication_report

8/9/2019 32-Bit Signed Multiplication_report

1/19

UNIVERSITI TEKNOLOGI MALAYSIA

Faculty of Electrical Engineering

HW/SW Co-design of a Nios II-based Embedded System

32-bits Signed Multiplication

Report from a project conducted on11st

Sept 2009

as part of SEW 4722 at the ECAD Laboratory

SEW 4722, Section 1, Group No. 3

Eunice Ng Hui XianLee Chen Cheak

Mohd Firdaus

Teoh Shu Wen


2/19

HW/SW Co-design of a Nios-II-based

Embedded System32-bits Signed Multiplication

Eunice Ng Hui Xian, Lee Chen Cheak, Mohd Firdaus, Teoh Shu WenFaculty of Electrical Engineering

Universiti Teknologi Malaysia81310 UTM Skudai, Johor, Malaysia

AbstractBefore the implementation of the laboratory work, a

process of create, compile, and download of a Nios II-based

embedded system is implemented on Altera DE2 board forsystem verification was completed. In this paper, an embedded

system application is written using C++ programming language

and is run on Nios II-based embedded system. Meanwhile, a

hardware accelerator is designed to do the previous operation. A

system bus interface and firmware device driver of the hardwareaccelerator is also designed and is integrated into the Nios II-

based embedded system.

I. INTRODUCTIONIn today's world, embedded systems are everywhere --

homes, offices, cars, factories, hospitals, plans and consumer

electronics. Their huge numbers and new complexity call for a

new design approach, one that emphasizes high-level tools and

hardware / software tradeoffs, rather than low-level assembly-

language programming and logic design.

An embedded system is a system designed to perform one

or few dedicated functions which often involve real-timecomputing. As embedded system is designed only to perform

dedicated function(s), engineers can optimize it, reducing the

size and cost of the product. Examples of embedded systemare PDAs, MP3 players, mobile phones, digital cameras, DVD

players, GPS receivers and printers.

This PBL lab project designs an embedded 32-bit signed

multiplier with Field Programmable Gate Array (FPGA) based

hardware acceleration for multiplication of random number.

This is done by designing the 32-bit signed multiplication

hardware core. All the hardware cores are integrated into an

embedded system implemented as a System-on-Chip (SoC).

With the state-of-the-art very large scale integration (VLSI)

technology available today, many of the embedded systems or

substantial parts of the systems can be integrated on a single,

programmable platform. In other words, these embedded

systems are implemented as System-on-Chip, which from here

on will be referred to as a SoC design or SoC embedded

system.

SoC is the system that integrates all the hardware

components of the embedded system into a single integrated

circuit (IC) chip. The result is a single chip with no external

connections to other chips, thus reducing the size and

packaging of the product.

This lab also introduces the Altera SOPC Builder (System-

on-Programmable Chip) to develop a Nios II-based Embedded

System using SOPC Builder, Quartus II, and Nios II IDE

software. It aims to design the software and hardware partition

(hardware IP core) of an embedded system. It also aims to

perform the design-space exploration between the hardwareand software partition when performing specific computation.

The performance metric is measured in logic cost andcomputation cycle count.

II. METHODOLOGYThis project is divided into two design parts that are,

software and hardware. The software part is to write a RNG

and 32-bits signed multiplication in the embedded system

application using C++ programming language and for the

hardware part is to design a mul_coprocessor. After designing

the both parts, they will be downloaded into the Altera DE2

board.

Fig. 1 Work flow of lab project


3/19

For the software design part, there are two tasks need to be

completed that are, Random Number Generator (RNG) and

32-bits signed multiplication. The RNG that is written using

C++ programming language has to generate 25 sets of 32-bits

signed random number as input operand. The random seed

was based on user input. When user inserts a value, the

software will generates 25 sets of random numbers including

signed numbers. Then the software performs 32-bits signed

multiplication and display the random number generator and

multiplication output.

Referring to Fig.2 shows a flowchart of the function. Once

start the program, it requests a user input to enter a seed

number. The user input will store as seed number and will

generate 25 sets of random numbers based on the seed

number. After one set of random number is generated, it will

then perform the multiplication operation. 32-bits signed

number as a user input will produce 64-bits of output

multiplication.

Referring to Appendix 1 shows a software code for the 32-

bits signed multiplication function. To generate the random

number, rand() and rand()%2 are used to verify the

number generated was a odd or even number. If it is true

means that the remainder exited and sure that the number was

an add number. All odd numbers will be a negative signed

number.

For the second part of the project, VHDL code is written

based on the provided algorithm. The algorithm for signedmultiplication is as follows.

Input : x, y (signed number)

Output : P, where P = x * y

A = x, B = y, P = 0

for i = 0 to 31 do

If Bi = 1 then

P = P + A

End if

A


4/19

The signed multiplication operation requires several

considerations to be taken regarding the MSB of the inputs.

This is shown in Fig.5 below.

A B Operation

Positive Positive No change

Positive Negative Swap A and B

Negative Positive No change

Negative Negative 2s complement A

and B

This is needed so that the algorithm will work on sign

numbers. The modifications were made by adding a

combinational block to the data path unit, named convert

block, which detects the sign of the inputs and changes it

accordingly. Besides that, during the loading of input A, it is

sign extended with the appropriate bit. This is coded in the

data path unit.

Waveform simulation was then performed to verify the

functionality. The results are shown in Appendix 4. It can be

observed that the results are valid for all combination of

signed numbers.

To connect the multiplier designed earlier to the system

interconnect fabric, an Avalon Memory-Mapped Bus Interface

is needed. As shown in Appendix 5, the MU_interface

connects the multiplier to the system interconnect fabric. This

combination of MU_avalon and mul_MU formed

MU_avalon, which is the hardware accelerator. Note that only

some of the I/O of the mul_MU were connected to the

MU_interface.

VHDL codes shown in Appendix 6 and 7 shows thefunctional mechanism of MU_interface and MU_avalon

respectively.

Referring to both Appendix 5 and 6, we will first discuss

the architecture of the MU_interface. When resetequals one,

the output ofreaddata will be zero. The output ofstartwill be

zero also, which is, the multiplier will be in off state. When

chipselectequals to one, a 2-bit data will be loaded into the

address. If the input ofaddress is 00, the output ofstartwill

be equal to 1, which initiates the multiplier(mul_MU).

When the input ofaddress equals to 01, the output ofdata1

will be equal to the input of writedata. When the input of

address equals to 10, the output ofdata2 will be equal to the

input ofwritedata. The output ofreaddata equals to the inputofresultwhen address input equals to 11.

Next, the architecture of the MU_avalon will be discussed.

The connection is shown in Appendix 5. As the mul_MU

inputs two 32-bit data to perform multiplication, the result of

the multiplication will be 64-bit. Due to the restriction of the

width of the Avalon bus, which is 32-bit, this attempt is made:

a range is set for user input, which is from -32768 to 32767.

This is the range of signed 16-bit numbers. These inputs are

loaded into the mul_MU as a 32-bit data. After multiplication,

a 64-bit result will be obtained. However, the upper 32 bit will

always be 00000000000000000000000000000000 or

11111111111111111111111111111111because the input is

set to 16-bit range. Hence, the upper 32-bit data can beomitted. The resultinput of the MU_interface will only load

the lower 32-bit data from the result output of mul_MU.

Therefore, the readdata output which take the value of result

input will be in 32-bit, which fulfills the Avalon bus width

restriction.

The disadvantage of the previous attempt is that only

numbers within 16-bit range can be accepted. To expand the

input range to 32-bit signed number range, an alternative was

suggested. An extra register have to be added into the

MU_interface to hold the 64-bit value from the resultoutput

ofmul_MU. It will then output the 64-bit data through the 32-

bit readdata output separately. It will first output the upper 32-

bit, then the lower 32-bit sequentially. If this method is used,an extra session must be added into the firmware design, so

that it will read the output ofreaddata as a loop.

Fig.4 Functional block diagram

Fig.5 Multiplication operation


5/19

III. RESULT AND ANALYSISFor the first part in the project which is required to write a

software program in C++ to produce a Random Number

Generator (RNG) with 32-bits signed multiplication function.

In this part, there did not have any big problem to come out a

program by using C++ language, since the language is already

familiar from the past. The only problem that had been faced

at the beginning was, a wrong type of variable type was

assigned to the variable, so cannot obtain the random 32bit

number. Fig.xx below shows the output of the 32-bits signed

multiplication .

Then we continue to integrate the mul_coprocessor into a

NIOS-II SoC as user peripheral. This part can easily be done

because we have do the same thing we we do it our pre-lab.

In the part to write the firmware device driver of the

mul_coprocessor, execute by the Nios II CPU to compute the

signed Multiplication number. Due to the time limitation and

the problem we faced at the previous part, we cant write the

firmware in term to understand the coding of the system.h. we

were just able to finished up to this part.

Since we cant finish the firmware, then we cant proceed

to Question 3 in combination. But we try to find the solution

as theoretical compare between the hardware and the software.

In prediction, the multiplication operation compute by the

hardware mul_coprocessor will be faster than the software

mul_coprocessor. As the hardware multiplication is using an

SoC as a platform thus it has all advantages of an SoC system.

In part of the design trade off, we can find that the hardware

have the higher cost compare the software. It can bedetermined through the LE cost after the simulation and before

the simulation.

CONCLUSION

As conclusion, by doing this lab, we brush up our C

programming language. Besides, get to know more about the

VHDL language and also learned a more proper way to use

HDL language to come out a design. In addition, learned

about the SOPC system although we are unable to finish the

entire lab, but we have learned a lot through this lab session.

The most important thing we get from this lab is we found out

how importance cooperation is, and things cannot be done just

by one without others. Instead of gaining knowledge, we were

gained more on soft skill side. We learn about how to

communicate with each others and the importance of

communication.

REFERENCES

[1] Dr. Mohamed Khalil Hani, Starters Guide to Digital Systems VHDL& Verilog Design 2nd Edition , Pearson Prentice Hall

[2] B. Stephen, V. Zwonko, Fundamentals of digital logic with vhdl design2nd ed, Mc Graw Hill Higher Education, 2005.

APPENDICES

Fig. 6 Output of 32-bits signed multiplication


6/19

APPENDIX 1- C++ code RNG.

#include

#include

#include

using namespace std;

int main()

{

short i;int seed;

int set1[25];

int set2[25];

long long mul[25];

cout > seed;

srand(seed);

for (i=0; i


7/19

RTL Control Sequence Table

RTL Operation Activated Control Signals

Psel ldP ctrlA ldA ctrlB ldB

DU Control Vector

Psel ldP ctrlA ldA ctrlB ldB

S1: P0;

(Start)/AMSB &dataA

(Start)/BdataB

(Start)/A

go to S1

Psel ldP

ctrlA ldA

ctrlB ldB

0 1 0 0 0 0

0 1 1 1 1 1

S2: AA1

(Zb0)/PP+A

Z/go to S2

ctrlA ldA

ctrlB ldB

Psel ldP

ctrlA ldA

0 0 0 1 0 1

1 1 0 1 0 1

0 0 0 1 0 1

S3: done 1

(Start)/ go to S1

(Start)/ go to S3

0 0 0 0 0 0

APPENDIX 3


8/19

VHDL Codes

64-bit Register

library ieee;

use ieee.std_logic_1164.all;

entity mul_Reg64 is

port (clk, en, rst : in std_logic;

d : in std_logic_vector (63 downto 0);Q : buffer std_logic_vector (63 downto 0));

end mul_Reg64;

architecture arch_reg of mul_Reg64 is begin

process (clk, rst) begin

if rst = '1' then Q '0');

elsif (clk'event and clk = '1') then

if en = '1' then Q


9/19

library ieee;


entity mul_shiftRreg32 is

port (d : in std_logic_vector (31 downto 0);

ldsh, en, w, clk, rst : in std_logic;

q : buffer std_logic_vector (31 downto 0));

end mul_shiftRreg32;

architecture Shift_arch of mul_shiftRreg32 is beginprocess (clk, rst) begin

if rst = '1' then q '0');

elsif (clk'event and clk = '1') then

if en = '1' then

if ldsh = '1' then q


10/19

LIBRARY ieee;

USE ieee.std_logic_1164.all;

USE ieee.std_logic_unsigned.all;

entity mul_convert is

port (dataA, dataB : in std_logic_vector (31 downto 0);

outA, outB : buffer std_logic_vector (31 downto 0));

end mul_convert;

architecture arch_convert of mul_convert issignal z: std_logic_vector(1 downto 0);

signal tempA: std_logic_vector(31 downto 0);

signal tempB: std_logic_vector(31 downto 0);

begin

process (z, dataA, dataB, outA, outB,tempA,tempB) begin

z(1)


11/19

library ieee;

use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;

entity mul_DU is

port( clk, rst : in std_logic;in_dataA, in_dataB: in std_logic_vector(31 downto 0);

P : buffer std_logic_vector(63 downto 0);

Psel, ldP, ctrlA, ldA, ctrlB, ldB : in std_logic;A_tp : out std_logic_vector(63 downto 0);

B_tp :out std_logic_vector(31 downto 0);

dataP_tp : out std_logic_vector(63 downto 0);z, b0 : out std_logic);

end mul_DU;

architecture DU_arch of mul_DU issignal Ain, A, sum, dataP : std_logic_vector(63 downto 0);

signal B : std_logic_vector(31 downto 0);signal zero1 : std_logic;

signal dataA, dataB : std_logic_vector(31 downto 0);

component mul_Reg64 port (d : in std_logic_vector(63 downto 0);

en, clk, rst : in std_logic;q : buffer std_logic_vector(63 downto 0));

end component;

component mul_ShiftLreg64 port (

d : in std_logic_vector(63 downto 0);ldsh, en, w, clk, rst : in std_logic;

q : buffer std_logic_vector(63 downto 0));end component;

component mul_ShiftRreg32 port (

d : in std_logic_vector(31 downto 0);ldsh,en, w, clk, rst : in std_logic;

q : buffer std_logic_vector(31 downto 0));

end component;

component mul_convert port (

dataA, dataB : in std_logic_vector (31 downto 0);outA, outB : buffer std_logic_vector (31 downto 0));

end component;

beginzero1


12/19

library ieee;


use ieee.std_logic_arith.all;

entity mul_CU is port(

clk, rst, start, b0,z : in std_logic;

done : out std_logic;

state : out std_logic_vector(1 downto 0);

CtrlVector : out std_logic_vector(5 downto 0));

end mul_CU;architecture fsm of mul_CU is

signal y: std_logic_vector (1 downto 0);

constant S1: std_logic_vector (1 downto 0):= "00";



begin

fsm_transitions:process (clk, rst) begin

if (rst='1')then

y


13/19

LIBRARY ieee;

USE ieee.std_logic_1164.all;

USE ieee.std_logic_signed.all;

entity mul_MU is port(

clock, start : in std_logic;

dataA, dataB : in std_logic_vector(31 downto 0);

result : buffer std_logic_vector(63 downto 0);

reset : in std_logic;

CtrlVector : out std_logic_vector (5 downto 0);done : out std_logic;


tpA : out std_logic_vector(63 downto 0);

tpB : out std_logic_vector (31 downto 0);

tpdataP : out std_logic_vector(63 downto 0));

end mul_MU;

architecture MU_arch of mul_MU is

signal intb0, intz : std_logic;

signal intCtrlVec : std_logic_vector(5 downto 0);

component mul_CU port(

clk, rst, start : in std_logic;

b0, z : in std_logic;

done : out std_logic;state : out std_logic_vector(1 downto 0);

CtrlVector :out std_logic_vector(5 downto 0));

end component;

component mul_DU port(

clk, rst : in std_logic;

in_dataA, in_dataB: in std_logic_vector(31 downto 0);

P : buffer std_logic_vector(63 downto 0);

Psel, ldP, ctrlA, ldA, ctrlB, ldB : in std_logic;

A_tp : out std_logic_vector(63 downto 0);

B_tp :out std_logic_vector(31 downto 0);

dataP_tp : out std_logic_vector(63 downto 0);

z, b0 : out std_logic);

end component;

begin

U_CU:mul_CU port map(clock, reset, start, intb0, intz, done, state, intCtrlVec);

CtrlVector


14/19

Signed Multiplier Waveform Simulation (Timing)

Example 1: -1024 x 272 = -278528

Example 2: -512 x -112 = 57344

Example 3: 15360 x -16368 = -251412480


15/19

Example 4: 0 x 272 = 0

APPENDIX 5


16/19

APPENDIX 6


17/19

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.std_logic_arith.all;

ENTITY MU_interface IS

PORT ( reset : IN STD_LOGIC;

clk : IN STD_LOGIC;

chipselect : IN STD_LOGIC;

address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);write : IN STD_LOGIC;

writedata : IN STD_LOGIC_VECTOR (63 DOWNTO 0);

readdata : OUT STD_LOGIC_VECTOR 63 DOWNTO 0);

start : OUT STD_LOGIC;

data1 : OUT STD_LOGIC_VECTOR(63 DOWNTO 0);


result : IN STD_LOGIC_VECTOR (63 DOWNTO 0));

END MU_interface;

ARCHITECTURE arch OF MU_interface IS

BEGIN

process (reset, clk)

begin

if reset = '1' then

readdata '0');

start


18/19

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.std_logic_arith.all;

ENTITY MU_avalon IS


clk : IN STD_LOGIC;

chipselect : IN STD_LOGIC;address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);

write : IN STD_LOGIC;


readdata : OUT STD_LOGIC_VECTOR (63 DOWNTO 0)

);

END MU_avalon;

ARCHITECTURE avalon_arch OF MU_avalon IS

signal lineA : std_logic_vector (63 downto 0);

signal lineB : std_logic_vector (63 downto 0);

signal start_signal : std_logic;

signal result_MU : std_logic_Vector (63 downto 0);

COMPONENT MU_interface IS


clk : IN STD_LOGIC;

chipselect : IN STD_LOGIC;

address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);

write : IN STD_LOGIC;


readdata : OUT STD_LOGIC_VECTOR (63 DOWNTO 0);

start : OUT STD_LOGIC;


data2 : OUT STD_LOGIC_VECTOR (63 downto 0);

result : IN STD_LOGIC_VECTOR (63 DOWNTO 0));

END COMPONENT;

COMPONENT mul_MU IS

PORT ( clock, start : in std_logic;

dataA, dataB : in std_logic_vector(31 downto 0);

result : buffer std_logic_vector(63 downto 0);

reset : in std_logic;

CtrlVector : out std_logic_vector (5 downto 0);

done : out std_logic;


tpA : out std_logic_vector(63 downto 0);

tpB : out std_logic_vector (31 downto 0);

tpdataP : out std_logic_vector(63 downto 0));

END component;

BEGINU_MuUnit: mul_MU

port map ( clock => clk,

dataA => lineA,

dataB => lineB,

start => start_signal,


19/19

result => result_MU,

reset => reset

);

U_Interface_MU: MU_interface

port map ( clk => clk,

reset => reset,

chipselect => chipselect,

address => address,

write => write,writedata => writedata,

readdata => readdata,

result => result_MU,

data1 => lineA(31 downto 0),

data2 => lineB(31 downto 0),

start => start_signal

);

END avalon_arch;

32-bit signed multiplication_report

Documents