32-bit signed multiplication_report
TRANSCRIPT
-
8/9/2019 32-Bit Signed Multiplication_report
1/19
UNIVERSITI TEKNOLOGI MALAYSIA
Faculty of Electrical Engineering
HW/SW Co-design of a Nios II-based Embedded System
32-bits Signed Multiplication
Report from a project conducted on11st
Sept 2009
as part of SEW 4722 at the ECAD Laboratory
SEW 4722, Section 1, Group No. 3
Eunice Ng Hui XianLee Chen Cheak
Mohd Firdaus
Teoh Shu Wen
-
8/9/2019 32-Bit Signed Multiplication_report
2/19
HW/SW Co-design of a Nios-II-based
Embedded System32-bits Signed Multiplication
Eunice Ng Hui Xian, Lee Chen Cheak, Mohd Firdaus, Teoh Shu WenFaculty of Electrical Engineering
Universiti Teknologi Malaysia81310 UTM Skudai, Johor, Malaysia
AbstractBefore the implementation of the laboratory work, a
process of create, compile, and download of a Nios II-based
embedded system is implemented on Altera DE2 board forsystem verification was completed. In this paper, an embedded
system application is written using C++ programming language
and is run on Nios II-based embedded system. Meanwhile, a
hardware accelerator is designed to do the previous operation. A
system bus interface and firmware device driver of the hardwareaccelerator is also designed and is integrated into the Nios II-
based embedded system.
I. INTRODUCTIONIn today's world, embedded systems are everywhere --
homes, offices, cars, factories, hospitals, plans and consumer
electronics. Their huge numbers and new complexity call for a
new design approach, one that emphasizes high-level tools and
hardware / software tradeoffs, rather than low-level assembly-
language programming and logic design.
An embedded system is a system designed to perform one
or few dedicated functions which often involve real-timecomputing. As embedded system is designed only to perform
dedicated function(s), engineers can optimize it, reducing the
size and cost of the product. Examples of embedded systemare PDAs, MP3 players, mobile phones, digital cameras, DVD
players, GPS receivers and printers.
This PBL lab project designs an embedded 32-bit signed
multiplier with Field Programmable Gate Array (FPGA) based
hardware acceleration for multiplication of random number.
This is done by designing the 32-bit signed multiplication
hardware core. All the hardware cores are integrated into an
embedded system implemented as a System-on-Chip (SoC).
With the state-of-the-art very large scale integration (VLSI)
technology available today, many of the embedded systems or
substantial parts of the systems can be integrated on a single,
programmable platform. In other words, these embedded
systems are implemented as System-on-Chip, which from here
on will be referred to as a SoC design or SoC embedded
system.
SoC is the system that integrates all the hardware
components of the embedded system into a single integrated
circuit (IC) chip. The result is a single chip with no external
connections to other chips, thus reducing the size and
packaging of the product.
This lab also introduces the Altera SOPC Builder (System-
on-Programmable Chip) to develop a Nios II-based Embedded
System using SOPC Builder, Quartus II, and Nios II IDE
software. It aims to design the software and hardware partition
(hardware IP core) of an embedded system. It also aims to
perform the design-space exploration between the hardwareand software partition when performing specific computation.
The performance metric is measured in logic cost andcomputation cycle count.
II. METHODOLOGYThis project is divided into two design parts that are,
software and hardware. The software part is to write a RNG
and 32-bits signed multiplication in the embedded system
application using C++ programming language and for the
hardware part is to design a mul_coprocessor. After designing
the both parts, they will be downloaded into the Altera DE2
board.
Fig. 1 Work flow of lab project
-
8/9/2019 32-Bit Signed Multiplication_report
3/19
For the software design part, there are two tasks need to be
completed that are, Random Number Generator (RNG) and
32-bits signed multiplication. The RNG that is written using
C++ programming language has to generate 25 sets of 32-bits
signed random number as input operand. The random seed
was based on user input. When user inserts a value, the
software will generates 25 sets of random numbers including
signed numbers. Then the software performs 32-bits signed
multiplication and display the random number generator and
multiplication output.
Referring to Fig.2 shows a flowchart of the function. Once
start the program, it requests a user input to enter a seed
number. The user input will store as seed number and will
generate 25 sets of random numbers based on the seed
number. After one set of random number is generated, it will
then perform the multiplication operation. 32-bits signed
number as a user input will produce 64-bits of output
multiplication.
Referring to Appendix 1 shows a software code for the 32-
bits signed multiplication function. To generate the random
number, rand() and rand()%2 are used to verify the
number generated was a odd or even number. If it is true
means that the remainder exited and sure that the number was
an add number. All odd numbers will be a negative signed
number.
For the second part of the project, VHDL code is written
based on the provided algorithm. The algorithm for signedmultiplication is as follows.
Input : x, y (signed number)
Output : P, where P = x * y
A = x, B = y, P = 0
for i = 0 to 31 do
If Bi = 1 then
P = P + A
End if
A
-
8/9/2019 32-Bit Signed Multiplication_report
4/19
The signed multiplication operation requires several
considerations to be taken regarding the MSB of the inputs.
This is shown in Fig.5 below.
A B Operation
Positive Positive No change
Positive Negative Swap A and B
Negative Positive No change
Negative Negative 2s complement A
and B
This is needed so that the algorithm will work on sign
numbers. The modifications were made by adding a
combinational block to the data path unit, named convert
block, which detects the sign of the inputs and changes it
accordingly. Besides that, during the loading of input A, it is
sign extended with the appropriate bit. This is coded in the
data path unit.
Waveform simulation was then performed to verify the
functionality. The results are shown in Appendix 4. It can be
observed that the results are valid for all combination of
signed numbers.
To connect the multiplier designed earlier to the system
interconnect fabric, an Avalon Memory-Mapped Bus Interface
is needed. As shown in Appendix 5, the MU_interface
connects the multiplier to the system interconnect fabric. This
combination of MU_avalon and mul_MU formed
MU_avalon, which is the hardware accelerator. Note that only
some of the I/O of the mul_MU were connected to the
MU_interface.
VHDL codes shown in Appendix 6 and 7 shows thefunctional mechanism of MU_interface and MU_avalon
respectively.
Referring to both Appendix 5 and 6, we will first discuss
the architecture of the MU_interface. When resetequals one,
the output ofreaddata will be zero. The output ofstartwill be
zero also, which is, the multiplier will be in off state. When
chipselectequals to one, a 2-bit data will be loaded into the
address. If the input ofaddress is 00, the output ofstartwill
be equal to 1, which initiates the multiplier(mul_MU).
When the input ofaddress equals to 01, the output ofdata1
will be equal to the input of writedata. When the input of
address equals to 10, the output ofdata2 will be equal to the
input ofwritedata. The output ofreaddata equals to the inputofresultwhen address input equals to 11.
Next, the architecture of the MU_avalon will be discussed.
The connection is shown in Appendix 5. As the mul_MU
inputs two 32-bit data to perform multiplication, the result of
the multiplication will be 64-bit. Due to the restriction of the
width of the Avalon bus, which is 32-bit, this attempt is made:
a range is set for user input, which is from -32768 to 32767.
This is the range of signed 16-bit numbers. These inputs are
loaded into the mul_MU as a 32-bit data. After multiplication,
a 64-bit result will be obtained. However, the upper 32 bit will
always be 00000000000000000000000000000000 or
11111111111111111111111111111111because the input is
set to 16-bit range. Hence, the upper 32-bit data can beomitted. The resultinput of the MU_interface will only load
the lower 32-bit data from the result output of mul_MU.
Therefore, the readdata output which take the value of result
input will be in 32-bit, which fulfills the Avalon bus width
restriction.
The disadvantage of the previous attempt is that only
numbers within 16-bit range can be accepted. To expand the
input range to 32-bit signed number range, an alternative was
suggested. An extra register have to be added into the
MU_interface to hold the 64-bit value from the resultoutput
ofmul_MU. It will then output the 64-bit data through the 32-
bit readdata output separately. It will first output the upper 32-
bit, then the lower 32-bit sequentially. If this method is used,an extra session must be added into the firmware design, so
that it will read the output ofreaddata as a loop.
Fig.4 Functional block diagram
Fig.5 Multiplication operation
-
8/9/2019 32-Bit Signed Multiplication_report
5/19
III. RESULT AND ANALYSISFor the first part in the project which is required to write a
software program in C++ to produce a Random Number
Generator (RNG) with 32-bits signed multiplication function.
In this part, there did not have any big problem to come out a
program by using C++ language, since the language is already
familiar from the past. The only problem that had been faced
at the beginning was, a wrong type of variable type was
assigned to the variable, so cannot obtain the random 32bit
number. Fig.xx below shows the output of the 32-bits signed
multiplication .
Then we continue to integrate the mul_coprocessor into a
NIOS-II SoC as user peripheral. This part can easily be done
because we have do the same thing we we do it our pre-lab.
In the part to write the firmware device driver of the
mul_coprocessor, execute by the Nios II CPU to compute the
signed Multiplication number. Due to the time limitation and
the problem we faced at the previous part, we cant write the
firmware in term to understand the coding of the system.h. we
were just able to finished up to this part.
Since we cant finish the firmware, then we cant proceed
to Question 3 in combination. But we try to find the solution
as theoretical compare between the hardware and the software.
In prediction, the multiplication operation compute by the
hardware mul_coprocessor will be faster than the software
mul_coprocessor. As the hardware multiplication is using an
SoC as a platform thus it has all advantages of an SoC system.
In part of the design trade off, we can find that the hardware
have the higher cost compare the software. It can bedetermined through the LE cost after the simulation and before
the simulation.
CONCLUSION
As conclusion, by doing this lab, we brush up our C
programming language. Besides, get to know more about the
VHDL language and also learned a more proper way to use
HDL language to come out a design. In addition, learned
about the SOPC system although we are unable to finish the
entire lab, but we have learned a lot through this lab session.
The most important thing we get from this lab is we found out
how importance cooperation is, and things cannot be done just
by one without others. Instead of gaining knowledge, we were
gained more on soft skill side. We learn about how to
communicate with each others and the importance of
communication.
REFERENCES
[1] Dr. Mohamed Khalil Hani, Starters Guide to Digital Systems VHDL& Verilog Design 2nd Edition , Pearson Prentice Hall
[2] B. Stephen, V. Zwonko, Fundamentals of digital logic with vhdl design2nd ed, Mc Graw Hill Higher Education, 2005.
APPENDICES
Fig. 6 Output of 32-bits signed multiplication
-
8/9/2019 32-Bit Signed Multiplication_report
6/19
APPENDIX 1- C++ code RNG.
#include
#include
#include
using namespace std;
int main()
{
short i;int seed;
int set1[25];
int set2[25];
long long mul[25];
cout > seed;
srand(seed);
for (i=0; i
-
8/9/2019 32-Bit Signed Multiplication_report
7/19
RTL Control Sequence Table
RTL Operation Activated Control Signals
Psel ldP ctrlA ldA ctrlB ldB
DU Control Vector
Psel ldP ctrlA ldA ctrlB ldB
S1: P0;
(Start)/AMSB &dataA
(Start)/BdataB
(Start)/A
go to S1
Psel ldP
ctrlA ldA
ctrlB ldB
0 1 0 0 0 0
0 1 1 1 1 1
S2: AA1
(Zb0)/PP+A
Z/go to S2
ctrlA ldA
ctrlB ldB
Psel ldP
ctrlA ldA
0 0 0 1 0 1
1 1 0 1 0 1
0 0 0 1 0 1
S3: done 1
(Start)/ go to S1
(Start)/ go to S3
0 0 0 0 0 0
APPENDIX 3
-
8/9/2019 32-Bit Signed Multiplication_report
8/19
VHDL Codes
64-bit Register
library ieee;
use ieee.std_logic_1164.all;
entity mul_Reg64 is
port (clk, en, rst : in std_logic;
d : in std_logic_vector (63 downto 0);Q : buffer std_logic_vector (63 downto 0));
end mul_Reg64;
architecture arch_reg of mul_Reg64 is begin
process (clk, rst) begin
if rst = '1' then Q '0');
elsif (clk'event and clk = '1') then
if en = '1' then Q
-
8/9/2019 32-Bit Signed Multiplication_report
9/19
library ieee;
use ieee.std_logic_1164.all;
entity mul_shiftRreg32 is
port (d : in std_logic_vector (31 downto 0);
ldsh, en, w, clk, rst : in std_logic;
q : buffer std_logic_vector (31 downto 0));
end mul_shiftRreg32;
architecture Shift_arch of mul_shiftRreg32 is beginprocess (clk, rst) begin
if rst = '1' then q '0');
elsif (clk'event and clk = '1') then
if en = '1' then
if ldsh = '1' then q
-
8/9/2019 32-Bit Signed Multiplication_report
10/19
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_unsigned.all;
entity mul_convert is
port (dataA, dataB : in std_logic_vector (31 downto 0);
outA, outB : buffer std_logic_vector (31 downto 0));
end mul_convert;
architecture arch_convert of mul_convert issignal z: std_logic_vector(1 downto 0);
signal tempA: std_logic_vector(31 downto 0);
signal tempB: std_logic_vector(31 downto 0);
begin
process (z, dataA, dataB, outA, outB,tempA,tempB) begin
z(1)
-
8/9/2019 32-Bit Signed Multiplication_report
11/19
library ieee;
use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;
entity mul_DU is
port( clk, rst : in std_logic;in_dataA, in_dataB: in std_logic_vector(31 downto 0);
P : buffer std_logic_vector(63 downto 0);
Psel, ldP, ctrlA, ldA, ctrlB, ldB : in std_logic;A_tp : out std_logic_vector(63 downto 0);
B_tp :out std_logic_vector(31 downto 0);
dataP_tp : out std_logic_vector(63 downto 0);z, b0 : out std_logic);
end mul_DU;
architecture DU_arch of mul_DU issignal Ain, A, sum, dataP : std_logic_vector(63 downto 0);
signal B : std_logic_vector(31 downto 0);signal zero1 : std_logic;
signal dataA, dataB : std_logic_vector(31 downto 0);
component mul_Reg64 port (d : in std_logic_vector(63 downto 0);
en, clk, rst : in std_logic;q : buffer std_logic_vector(63 downto 0));
end component;
component mul_ShiftLreg64 port (
d : in std_logic_vector(63 downto 0);ldsh, en, w, clk, rst : in std_logic;
q : buffer std_logic_vector(63 downto 0));end component;
component mul_ShiftRreg32 port (
d : in std_logic_vector(31 downto 0);ldsh,en, w, clk, rst : in std_logic;
q : buffer std_logic_vector(31 downto 0));
end component;
component mul_convert port (
dataA, dataB : in std_logic_vector (31 downto 0);outA, outB : buffer std_logic_vector (31 downto 0));
end component;
beginzero1
-
8/9/2019 32-Bit Signed Multiplication_report
12/19
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
entity mul_CU is port(
clk, rst, start, b0,z : in std_logic;
done : out std_logic;
state : out std_logic_vector(1 downto 0);
CtrlVector : out std_logic_vector(5 downto 0));
end mul_CU;architecture fsm of mul_CU is
signal y: std_logic_vector (1 downto 0);
constant S1: std_logic_vector (1 downto 0):= "00";
constant S2: std_logic_vector (1 downto 0):= "01";
constant S3: std_logic_vector (1 downto 0):= "10";
begin
fsm_transitions:process (clk, rst) begin
if (rst='1')then
y
-
8/9/2019 32-Bit Signed Multiplication_report
13/19
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_signed.all;
entity mul_MU is port(
clock, start : in std_logic;
dataA, dataB : in std_logic_vector(31 downto 0);
result : buffer std_logic_vector(63 downto 0);
reset : in std_logic;
CtrlVector : out std_logic_vector (5 downto 0);done : out std_logic;
state : out std_logic_vector(1 downto 0);
tpA : out std_logic_vector(63 downto 0);
tpB : out std_logic_vector (31 downto 0);
tpdataP : out std_logic_vector(63 downto 0));
end mul_MU;
architecture MU_arch of mul_MU is
signal intb0, intz : std_logic;
signal intCtrlVec : std_logic_vector(5 downto 0);
component mul_CU port(
clk, rst, start : in std_logic;
b0, z : in std_logic;
done : out std_logic;state : out std_logic_vector(1 downto 0);
CtrlVector :out std_logic_vector(5 downto 0));
end component;
component mul_DU port(
clk, rst : in std_logic;
in_dataA, in_dataB: in std_logic_vector(31 downto 0);
P : buffer std_logic_vector(63 downto 0);
Psel, ldP, ctrlA, ldA, ctrlB, ldB : in std_logic;
A_tp : out std_logic_vector(63 downto 0);
B_tp :out std_logic_vector(31 downto 0);
dataP_tp : out std_logic_vector(63 downto 0);
z, b0 : out std_logic);
end component;
begin
U_CU:mul_CU port map(clock, reset, start, intb0, intz, done, state, intCtrlVec);
CtrlVector
-
8/9/2019 32-Bit Signed Multiplication_report
14/19
Signed Multiplier Waveform Simulation (Timing)
Example 1: -1024 x 272 = -278528
Example 2: -512 x -112 = 57344
Example 3: 15360 x -16368 = -251412480
-
8/9/2019 32-Bit Signed Multiplication_report
15/19
Example 4: 0 x 272 = 0
APPENDIX 5
-
8/9/2019 32-Bit Signed Multiplication_report
16/19
APPENDIX 6
-
8/9/2019 32-Bit Signed Multiplication_report
17/19
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
ENTITY MU_interface IS
PORT ( reset : IN STD_LOGIC;
clk : IN STD_LOGIC;
chipselect : IN STD_LOGIC;
address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);write : IN STD_LOGIC;
writedata : IN STD_LOGIC_VECTOR (63 DOWNTO 0);
readdata : OUT STD_LOGIC_VECTOR 63 DOWNTO 0);
start : OUT STD_LOGIC;
data1 : OUT STD_LOGIC_VECTOR(63 DOWNTO 0);
data2 : OUT STD_LOGIC_VECTOR(63 DOWNTO 0);
result : IN STD_LOGIC_VECTOR (63 DOWNTO 0));
END MU_interface;
ARCHITECTURE arch OF MU_interface IS
BEGIN
process (reset, clk)
begin
if reset = '1' then
readdata '0');
start
-
8/9/2019 32-Bit Signed Multiplication_report
18/19
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
ENTITY MU_avalon IS
PORT ( reset : IN STD_LOGIC;
clk : IN STD_LOGIC;
chipselect : IN STD_LOGIC;address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);
write : IN STD_LOGIC;
writedata : IN STD_LOGIC_VECTOR (63 DOWNTO 0);
readdata : OUT STD_LOGIC_VECTOR (63 DOWNTO 0)
);
END MU_avalon;
ARCHITECTURE avalon_arch OF MU_avalon IS
signal lineA : std_logic_vector (63 downto 0);
signal lineB : std_logic_vector (63 downto 0);
signal start_signal : std_logic;
signal result_MU : std_logic_Vector (63 downto 0);
COMPONENT MU_interface IS
PORT ( reset : IN STD_LOGIC;
clk : IN STD_LOGIC;
chipselect : IN STD_LOGIC;
address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);
write : IN STD_LOGIC;
writedata : IN STD_LOGIC_VECTOR (63 DOWNTO 0);
readdata : OUT STD_LOGIC_VECTOR (63 DOWNTO 0);
start : OUT STD_LOGIC;
data1 : OUT STD_LOGIC_VECTOR(63 DOWNTO 0);
data2 : OUT STD_LOGIC_VECTOR (63 downto 0);
result : IN STD_LOGIC_VECTOR (63 DOWNTO 0));
END COMPONENT;
COMPONENT mul_MU IS
PORT ( clock, start : in std_logic;
dataA, dataB : in std_logic_vector(31 downto 0);
result : buffer std_logic_vector(63 downto 0);
reset : in std_logic;
CtrlVector : out std_logic_vector (5 downto 0);
done : out std_logic;
state : out std_logic_vector(1 downto 0);
tpA : out std_logic_vector(63 downto 0);
tpB : out std_logic_vector (31 downto 0);
tpdataP : out std_logic_vector(63 downto 0));
END component;
BEGINU_MuUnit: mul_MU
port map ( clock => clk,
dataA => lineA,
dataB => lineB,
start => start_signal,
-
8/9/2019 32-Bit Signed Multiplication_report
19/19
result => result_MU,
reset => reset
);
U_Interface_MU: MU_interface
port map ( clk => clk,
reset => reset,
chipselect => chipselect,
address => address,
write => write,writedata => writedata,
readdata => readdata,
result => result_MU,
data1 => lineA(31 downto 0),
data2 => lineB(31 downto 0),
start => start_signal
);
END avalon_arch;