32-bit signed multiplication_report

Upload: pidot9w2kda

Post on 30-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 32-Bit Signed Multiplication_report

    1/19

    UNIVERSITI TEKNOLOGI MALAYSIA

    Faculty of Electrical Engineering

    HW/SW Co-design of a Nios II-based Embedded System

    32-bits Signed Multiplication

    Report from a project conducted on11st

    Sept 2009

    as part of SEW 4722 at the ECAD Laboratory

    SEW 4722, Section 1, Group No. 3

    Eunice Ng Hui XianLee Chen Cheak

    Mohd Firdaus

    Teoh Shu Wen

  • 8/9/2019 32-Bit Signed Multiplication_report

    2/19

    HW/SW Co-design of a Nios-II-based

    Embedded System32-bits Signed Multiplication

    Eunice Ng Hui Xian, Lee Chen Cheak, Mohd Firdaus, Teoh Shu WenFaculty of Electrical Engineering

    Universiti Teknologi Malaysia81310 UTM Skudai, Johor, Malaysia

    AbstractBefore the implementation of the laboratory work, a

    process of create, compile, and download of a Nios II-based

    embedded system is implemented on Altera DE2 board forsystem verification was completed. In this paper, an embedded

    system application is written using C++ programming language

    and is run on Nios II-based embedded system. Meanwhile, a

    hardware accelerator is designed to do the previous operation. A

    system bus interface and firmware device driver of the hardwareaccelerator is also designed and is integrated into the Nios II-

    based embedded system.

    I. INTRODUCTIONIn today's world, embedded systems are everywhere --

    homes, offices, cars, factories, hospitals, plans and consumer

    electronics. Their huge numbers and new complexity call for a

    new design approach, one that emphasizes high-level tools and

    hardware / software tradeoffs, rather than low-level assembly-

    language programming and logic design.

    An embedded system is a system designed to perform one

    or few dedicated functions which often involve real-timecomputing. As embedded system is designed only to perform

    dedicated function(s), engineers can optimize it, reducing the

    size and cost of the product. Examples of embedded systemare PDAs, MP3 players, mobile phones, digital cameras, DVD

    players, GPS receivers and printers.

    This PBL lab project designs an embedded 32-bit signed

    multiplier with Field Programmable Gate Array (FPGA) based

    hardware acceleration for multiplication of random number.

    This is done by designing the 32-bit signed multiplication

    hardware core. All the hardware cores are integrated into an

    embedded system implemented as a System-on-Chip (SoC).

    With the state-of-the-art very large scale integration (VLSI)

    technology available today, many of the embedded systems or

    substantial parts of the systems can be integrated on a single,

    programmable platform. In other words, these embedded

    systems are implemented as System-on-Chip, which from here

    on will be referred to as a SoC design or SoC embedded

    system.

    SoC is the system that integrates all the hardware

    components of the embedded system into a single integrated

    circuit (IC) chip. The result is a single chip with no external

    connections to other chips, thus reducing the size and

    packaging of the product.

    This lab also introduces the Altera SOPC Builder (System-

    on-Programmable Chip) to develop a Nios II-based Embedded

    System using SOPC Builder, Quartus II, and Nios II IDE

    software. It aims to design the software and hardware partition

    (hardware IP core) of an embedded system. It also aims to

    perform the design-space exploration between the hardwareand software partition when performing specific computation.

    The performance metric is measured in logic cost andcomputation cycle count.

    II. METHODOLOGYThis project is divided into two design parts that are,

    software and hardware. The software part is to write a RNG

    and 32-bits signed multiplication in the embedded system

    application using C++ programming language and for the

    hardware part is to design a mul_coprocessor. After designing

    the both parts, they will be downloaded into the Altera DE2

    board.

    Fig. 1 Work flow of lab project

  • 8/9/2019 32-Bit Signed Multiplication_report

    3/19

    For the software design part, there are two tasks need to be

    completed that are, Random Number Generator (RNG) and

    32-bits signed multiplication. The RNG that is written using

    C++ programming language has to generate 25 sets of 32-bits

    signed random number as input operand. The random seed

    was based on user input. When user inserts a value, the

    software will generates 25 sets of random numbers including

    signed numbers. Then the software performs 32-bits signed

    multiplication and display the random number generator and

    multiplication output.

    Referring to Fig.2 shows a flowchart of the function. Once

    start the program, it requests a user input to enter a seed

    number. The user input will store as seed number and will

    generate 25 sets of random numbers based on the seed

    number. After one set of random number is generated, it will

    then perform the multiplication operation. 32-bits signed

    number as a user input will produce 64-bits of output

    multiplication.

    Referring to Appendix 1 shows a software code for the 32-

    bits signed multiplication function. To generate the random

    number, rand() and rand()%2 are used to verify the

    number generated was a odd or even number. If it is true

    means that the remainder exited and sure that the number was

    an add number. All odd numbers will be a negative signed

    number.

    For the second part of the project, VHDL code is written

    based on the provided algorithm. The algorithm for signedmultiplication is as follows.

    Input : x, y (signed number)

    Output : P, where P = x * y

    A = x, B = y, P = 0

    for i = 0 to 31 do

    If Bi = 1 then

    P = P + A

    End if

    A

  • 8/9/2019 32-Bit Signed Multiplication_report

    4/19

    The signed multiplication operation requires several

    considerations to be taken regarding the MSB of the inputs.

    This is shown in Fig.5 below.

    A B Operation

    Positive Positive No change

    Positive Negative Swap A and B

    Negative Positive No change

    Negative Negative 2s complement A

    and B

    This is needed so that the algorithm will work on sign

    numbers. The modifications were made by adding a

    combinational block to the data path unit, named convert

    block, which detects the sign of the inputs and changes it

    accordingly. Besides that, during the loading of input A, it is

    sign extended with the appropriate bit. This is coded in the

    data path unit.

    Waveform simulation was then performed to verify the

    functionality. The results are shown in Appendix 4. It can be

    observed that the results are valid for all combination of

    signed numbers.

    To connect the multiplier designed earlier to the system

    interconnect fabric, an Avalon Memory-Mapped Bus Interface

    is needed. As shown in Appendix 5, the MU_interface

    connects the multiplier to the system interconnect fabric. This

    combination of MU_avalon and mul_MU formed

    MU_avalon, which is the hardware accelerator. Note that only

    some of the I/O of the mul_MU were connected to the

    MU_interface.

    VHDL codes shown in Appendix 6 and 7 shows thefunctional mechanism of MU_interface and MU_avalon

    respectively.

    Referring to both Appendix 5 and 6, we will first discuss

    the architecture of the MU_interface. When resetequals one,

    the output ofreaddata will be zero. The output ofstartwill be

    zero also, which is, the multiplier will be in off state. When

    chipselectequals to one, a 2-bit data will be loaded into the

    address. If the input ofaddress is 00, the output ofstartwill

    be equal to 1, which initiates the multiplier(mul_MU).

    When the input ofaddress equals to 01, the output ofdata1

    will be equal to the input of writedata. When the input of

    address equals to 10, the output ofdata2 will be equal to the

    input ofwritedata. The output ofreaddata equals to the inputofresultwhen address input equals to 11.

    Next, the architecture of the MU_avalon will be discussed.

    The connection is shown in Appendix 5. As the mul_MU

    inputs two 32-bit data to perform multiplication, the result of

    the multiplication will be 64-bit. Due to the restriction of the

    width of the Avalon bus, which is 32-bit, this attempt is made:

    a range is set for user input, which is from -32768 to 32767.

    This is the range of signed 16-bit numbers. These inputs are

    loaded into the mul_MU as a 32-bit data. After multiplication,

    a 64-bit result will be obtained. However, the upper 32 bit will

    always be 00000000000000000000000000000000 or

    11111111111111111111111111111111because the input is

    set to 16-bit range. Hence, the upper 32-bit data can beomitted. The resultinput of the MU_interface will only load

    the lower 32-bit data from the result output of mul_MU.

    Therefore, the readdata output which take the value of result

    input will be in 32-bit, which fulfills the Avalon bus width

    restriction.

    The disadvantage of the previous attempt is that only

    numbers within 16-bit range can be accepted. To expand the

    input range to 32-bit signed number range, an alternative was

    suggested. An extra register have to be added into the

    MU_interface to hold the 64-bit value from the resultoutput

    ofmul_MU. It will then output the 64-bit data through the 32-

    bit readdata output separately. It will first output the upper 32-

    bit, then the lower 32-bit sequentially. If this method is used,an extra session must be added into the firmware design, so

    that it will read the output ofreaddata as a loop.

    Fig.4 Functional block diagram

    Fig.5 Multiplication operation

  • 8/9/2019 32-Bit Signed Multiplication_report

    5/19

    III. RESULT AND ANALYSISFor the first part in the project which is required to write a

    software program in C++ to produce a Random Number

    Generator (RNG) with 32-bits signed multiplication function.

    In this part, there did not have any big problem to come out a

    program by using C++ language, since the language is already

    familiar from the past. The only problem that had been faced

    at the beginning was, a wrong type of variable type was

    assigned to the variable, so cannot obtain the random 32bit

    number. Fig.xx below shows the output of the 32-bits signed

    multiplication .

    Then we continue to integrate the mul_coprocessor into a

    NIOS-II SoC as user peripheral. This part can easily be done

    because we have do the same thing we we do it our pre-lab.

    In the part to write the firmware device driver of the

    mul_coprocessor, execute by the Nios II CPU to compute the

    signed Multiplication number. Due to the time limitation and

    the problem we faced at the previous part, we cant write the

    firmware in term to understand the coding of the system.h. we

    were just able to finished up to this part.

    Since we cant finish the firmware, then we cant proceed

    to Question 3 in combination. But we try to find the solution

    as theoretical compare between the hardware and the software.

    In prediction, the multiplication operation compute by the

    hardware mul_coprocessor will be faster than the software

    mul_coprocessor. As the hardware multiplication is using an

    SoC as a platform thus it has all advantages of an SoC system.

    In part of the design trade off, we can find that the hardware

    have the higher cost compare the software. It can bedetermined through the LE cost after the simulation and before

    the simulation.

    CONCLUSION

    As conclusion, by doing this lab, we brush up our C

    programming language. Besides, get to know more about the

    VHDL language and also learned a more proper way to use

    HDL language to come out a design. In addition, learned

    about the SOPC system although we are unable to finish the

    entire lab, but we have learned a lot through this lab session.

    The most important thing we get from this lab is we found out

    how importance cooperation is, and things cannot be done just

    by one without others. Instead of gaining knowledge, we were

    gained more on soft skill side. We learn about how to

    communicate with each others and the importance of

    communication.

    REFERENCES

    [1] Dr. Mohamed Khalil Hani, Starters Guide to Digital Systems VHDL& Verilog Design 2nd Edition , Pearson Prentice Hall

    [2] B. Stephen, V. Zwonko, Fundamentals of digital logic with vhdl design2nd ed, Mc Graw Hill Higher Education, 2005.

    APPENDICES

    Fig. 6 Output of 32-bits signed multiplication

  • 8/9/2019 32-Bit Signed Multiplication_report

    6/19

    APPENDIX 1- C++ code RNG.

    #include

    #include

    #include

    using namespace std;

    int main()

    {

    short i;int seed;

    int set1[25];

    int set2[25];

    long long mul[25];

    cout > seed;

    srand(seed);

    for (i=0; i

  • 8/9/2019 32-Bit Signed Multiplication_report

    7/19

    RTL Control Sequence Table

    RTL Operation Activated Control Signals

    Psel ldP ctrlA ldA ctrlB ldB

    DU Control Vector

    Psel ldP ctrlA ldA ctrlB ldB

    S1: P0;

    (Start)/AMSB &dataA

    (Start)/BdataB

    (Start)/A

    go to S1

    Psel ldP

    ctrlA ldA

    ctrlB ldB

    0 1 0 0 0 0

    0 1 1 1 1 1

    S2: AA1

    (Zb0)/PP+A

    Z/go to S2

    ctrlA ldA

    ctrlB ldB

    Psel ldP

    ctrlA ldA

    0 0 0 1 0 1

    1 1 0 1 0 1

    0 0 0 1 0 1

    S3: done 1

    (Start)/ go to S1

    (Start)/ go to S3

    0 0 0 0 0 0

    APPENDIX 3

  • 8/9/2019 32-Bit Signed Multiplication_report

    8/19

    VHDL Codes

    64-bit Register

    library ieee;

    use ieee.std_logic_1164.all;

    entity mul_Reg64 is

    port (clk, en, rst : in std_logic;

    d : in std_logic_vector (63 downto 0);Q : buffer std_logic_vector (63 downto 0));

    end mul_Reg64;

    architecture arch_reg of mul_Reg64 is begin

    process (clk, rst) begin

    if rst = '1' then Q '0');

    elsif (clk'event and clk = '1') then

    if en = '1' then Q

  • 8/9/2019 32-Bit Signed Multiplication_report

    9/19

    library ieee;

    use ieee.std_logic_1164.all;

    entity mul_shiftRreg32 is

    port (d : in std_logic_vector (31 downto 0);

    ldsh, en, w, clk, rst : in std_logic;

    q : buffer std_logic_vector (31 downto 0));

    end mul_shiftRreg32;

    architecture Shift_arch of mul_shiftRreg32 is beginprocess (clk, rst) begin

    if rst = '1' then q '0');

    elsif (clk'event and clk = '1') then

    if en = '1' then

    if ldsh = '1' then q

  • 8/9/2019 32-Bit Signed Multiplication_report

    10/19

    LIBRARY ieee;

    USE ieee.std_logic_1164.all;

    USE ieee.std_logic_unsigned.all;

    entity mul_convert is

    port (dataA, dataB : in std_logic_vector (31 downto 0);

    outA, outB : buffer std_logic_vector (31 downto 0));

    end mul_convert;

    architecture arch_convert of mul_convert issignal z: std_logic_vector(1 downto 0);

    signal tempA: std_logic_vector(31 downto 0);

    signal tempB: std_logic_vector(31 downto 0);

    begin

    process (z, dataA, dataB, outA, outB,tempA,tempB) begin

    z(1)

  • 8/9/2019 32-Bit Signed Multiplication_report

    11/19

    library ieee;

    use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;

    entity mul_DU is

    port( clk, rst : in std_logic;in_dataA, in_dataB: in std_logic_vector(31 downto 0);

    P : buffer std_logic_vector(63 downto 0);

    Psel, ldP, ctrlA, ldA, ctrlB, ldB : in std_logic;A_tp : out std_logic_vector(63 downto 0);

    B_tp :out std_logic_vector(31 downto 0);

    dataP_tp : out std_logic_vector(63 downto 0);z, b0 : out std_logic);

    end mul_DU;

    architecture DU_arch of mul_DU issignal Ain, A, sum, dataP : std_logic_vector(63 downto 0);

    signal B : std_logic_vector(31 downto 0);signal zero1 : std_logic;

    signal dataA, dataB : std_logic_vector(31 downto 0);

    component mul_Reg64 port (d : in std_logic_vector(63 downto 0);

    en, clk, rst : in std_logic;q : buffer std_logic_vector(63 downto 0));

    end component;

    component mul_ShiftLreg64 port (

    d : in std_logic_vector(63 downto 0);ldsh, en, w, clk, rst : in std_logic;

    q : buffer std_logic_vector(63 downto 0));end component;

    component mul_ShiftRreg32 port (

    d : in std_logic_vector(31 downto 0);ldsh,en, w, clk, rst : in std_logic;

    q : buffer std_logic_vector(31 downto 0));

    end component;

    component mul_convert port (

    dataA, dataB : in std_logic_vector (31 downto 0);outA, outB : buffer std_logic_vector (31 downto 0));

    end component;

    beginzero1

  • 8/9/2019 32-Bit Signed Multiplication_report

    12/19

    library ieee;

    use ieee.std_logic_1164.all;

    use ieee.std_logic_arith.all;

    entity mul_CU is port(

    clk, rst, start, b0,z : in std_logic;

    done : out std_logic;

    state : out std_logic_vector(1 downto 0);

    CtrlVector : out std_logic_vector(5 downto 0));

    end mul_CU;architecture fsm of mul_CU is

    signal y: std_logic_vector (1 downto 0);

    constant S1: std_logic_vector (1 downto 0):= "00";

    constant S2: std_logic_vector (1 downto 0):= "01";

    constant S3: std_logic_vector (1 downto 0):= "10";

    begin

    fsm_transitions:process (clk, rst) begin

    if (rst='1')then

    y

  • 8/9/2019 32-Bit Signed Multiplication_report

    13/19

    LIBRARY ieee;

    USE ieee.std_logic_1164.all;

    USE ieee.std_logic_signed.all;

    entity mul_MU is port(

    clock, start : in std_logic;

    dataA, dataB : in std_logic_vector(31 downto 0);

    result : buffer std_logic_vector(63 downto 0);

    reset : in std_logic;

    CtrlVector : out std_logic_vector (5 downto 0);done : out std_logic;

    state : out std_logic_vector(1 downto 0);

    tpA : out std_logic_vector(63 downto 0);

    tpB : out std_logic_vector (31 downto 0);

    tpdataP : out std_logic_vector(63 downto 0));

    end mul_MU;

    architecture MU_arch of mul_MU is

    signal intb0, intz : std_logic;

    signal intCtrlVec : std_logic_vector(5 downto 0);

    component mul_CU port(

    clk, rst, start : in std_logic;

    b0, z : in std_logic;

    done : out std_logic;state : out std_logic_vector(1 downto 0);

    CtrlVector :out std_logic_vector(5 downto 0));

    end component;

    component mul_DU port(

    clk, rst : in std_logic;

    in_dataA, in_dataB: in std_logic_vector(31 downto 0);

    P : buffer std_logic_vector(63 downto 0);

    Psel, ldP, ctrlA, ldA, ctrlB, ldB : in std_logic;

    A_tp : out std_logic_vector(63 downto 0);

    B_tp :out std_logic_vector(31 downto 0);

    dataP_tp : out std_logic_vector(63 downto 0);

    z, b0 : out std_logic);

    end component;

    begin

    U_CU:mul_CU port map(clock, reset, start, intb0, intz, done, state, intCtrlVec);

    CtrlVector

  • 8/9/2019 32-Bit Signed Multiplication_report

    14/19

    Signed Multiplier Waveform Simulation (Timing)

    Example 1: -1024 x 272 = -278528

    Example 2: -512 x -112 = 57344

    Example 3: 15360 x -16368 = -251412480

  • 8/9/2019 32-Bit Signed Multiplication_report

    15/19

    Example 4: 0 x 272 = 0

    APPENDIX 5

  • 8/9/2019 32-Bit Signed Multiplication_report

    16/19

    APPENDIX 6

  • 8/9/2019 32-Bit Signed Multiplication_report

    17/19

    library IEEE;

    use IEEE.std_logic_1164.all;

    use IEEE.std_logic_arith.all;

    ENTITY MU_interface IS

    PORT ( reset : IN STD_LOGIC;

    clk : IN STD_LOGIC;

    chipselect : IN STD_LOGIC;

    address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);write : IN STD_LOGIC;

    writedata : IN STD_LOGIC_VECTOR (63 DOWNTO 0);

    readdata : OUT STD_LOGIC_VECTOR 63 DOWNTO 0);

    start : OUT STD_LOGIC;

    data1 : OUT STD_LOGIC_VECTOR(63 DOWNTO 0);

    data2 : OUT STD_LOGIC_VECTOR(63 DOWNTO 0);

    result : IN STD_LOGIC_VECTOR (63 DOWNTO 0));

    END MU_interface;

    ARCHITECTURE arch OF MU_interface IS

    BEGIN

    process (reset, clk)

    begin

    if reset = '1' then

    readdata '0');

    start

  • 8/9/2019 32-Bit Signed Multiplication_report

    18/19

    library IEEE;

    use IEEE.std_logic_1164.all;

    use IEEE.std_logic_arith.all;

    ENTITY MU_avalon IS

    PORT ( reset : IN STD_LOGIC;

    clk : IN STD_LOGIC;

    chipselect : IN STD_LOGIC;address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);

    write : IN STD_LOGIC;

    writedata : IN STD_LOGIC_VECTOR (63 DOWNTO 0);

    readdata : OUT STD_LOGIC_VECTOR (63 DOWNTO 0)

    );

    END MU_avalon;

    ARCHITECTURE avalon_arch OF MU_avalon IS

    signal lineA : std_logic_vector (63 downto 0);

    signal lineB : std_logic_vector (63 downto 0);

    signal start_signal : std_logic;

    signal result_MU : std_logic_Vector (63 downto 0);

    COMPONENT MU_interface IS

    PORT ( reset : IN STD_LOGIC;

    clk : IN STD_LOGIC;

    chipselect : IN STD_LOGIC;

    address : IN STD_LOGIC_VECTOR (1 DOWNTO 0);

    write : IN STD_LOGIC;

    writedata : IN STD_LOGIC_VECTOR (63 DOWNTO 0);

    readdata : OUT STD_LOGIC_VECTOR (63 DOWNTO 0);

    start : OUT STD_LOGIC;

    data1 : OUT STD_LOGIC_VECTOR(63 DOWNTO 0);

    data2 : OUT STD_LOGIC_VECTOR (63 downto 0);

    result : IN STD_LOGIC_VECTOR (63 DOWNTO 0));

    END COMPONENT;

    COMPONENT mul_MU IS

    PORT ( clock, start : in std_logic;

    dataA, dataB : in std_logic_vector(31 downto 0);

    result : buffer std_logic_vector(63 downto 0);

    reset : in std_logic;

    CtrlVector : out std_logic_vector (5 downto 0);

    done : out std_logic;

    state : out std_logic_vector(1 downto 0);

    tpA : out std_logic_vector(63 downto 0);

    tpB : out std_logic_vector (31 downto 0);

    tpdataP : out std_logic_vector(63 downto 0));

    END component;

    BEGINU_MuUnit: mul_MU

    port map ( clock => clk,

    dataA => lineA,

    dataB => lineB,

    start => start_signal,

  • 8/9/2019 32-Bit Signed Multiplication_report

    19/19

    result => result_MU,

    reset => reset

    );

    U_Interface_MU: MU_interface

    port map ( clk => clk,

    reset => reset,

    chipselect => chipselect,

    address => address,

    write => write,writedata => writedata,

    readdata => readdata,

    result => result_MU,

    data1 => lineA(31 downto 0),

    data2 => lineB(31 downto 0),

    start => start_signal

    );

    END avalon_arch;