report

72
University of Southampton Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science Design of a Pipelined PowerPC Processor using Verilog by Chidhambaranathan Rajamanikkam(canr1g09) 24 September 2010 A dissertation submitted in partial fulfillment of the degree of MSc Microelectronics Systems Design by examination and dissertation Project Supervisor: B Iain McNally Second Examiner: Dr. Koushik Maharatna

Upload: ronald-caravaca

Post on 13-Sep-2015

215 views

Category:

Documents


2 download

DESCRIPTION

.

TRANSCRIPT

  • University of Southampton

    Faculty of Engineering, Science and Mathematics

    School of Electronics and Computer Science

    Design of a Pipelined PowerPC Processor using Verilog

    by

    Chidhambaranathan Rajamanikkam(canr1g09)

    24 September 2010

    A dissertation submitted in partial fulfillment of the degree of

    MSc Microelectronics Systems Design

    by examination and dissertation

    Project Supervisor: B Iain McNally Second Examiner: Dr. Koushik Maharatna

  • School of Electronics and Computer Science

    Canr1g09

    1

    PowerPC processor is designed by IBM. It is widely used in many embedded systems because

    of its low power consumption. PowerPC processor is designed using the RISC (Reduced Instruction

    Set Computing) instruction set architecture.

    This project gives an overview of the implementation of a 32-bit pipelined processor. The

    designed pipelined processor is capable of executing PowerPC instructions. The instructions include

    PowerPC fixed point integer instructions, branch instructions and integer load/store instructions. The

    processor is designed using Verilog description language. These modules are successfully tested using

    NC Verilog simulator.

    This report gives the information of the instruction set and their forms and architecture of the

    PowerPC processor. It also covers the information on the pipelining approach adopted and the data

    and control hazards associated with it. The designed processor overcomes both data and control

    hazards.

  • School of Electronics and Computer Science

    Canr1g09

    2

    I would like to thank my project supervisor, B Iain McNally for his valuable guidance and

    support for doing this project. I would to extend my gratitude to my second examiner Dr. Koushik

    Maharatna for his support.

  • School of Electronics and Computer Science

    Canr1g09

    3

    ABSTRACT ------------------------------------------------------------------------------------------------------------------------------------- 1

    ACKNOWLEDGEMENT --------------------------------------------------------------------------------------------------------------------- 2

    CONTENTS ------------------------------------------------------------------------------------------------------------------------------------- 3

    LIST OF FIGURES ----------------------------------------------------------------------------------------------------------------------------- 5

    LIST OF TABLES ------------------------------------------------------------------------------------------------------------------------------- 6

    CHAPTER 1: INTRODUCTION ------------------------------------------------------------------------------------------------------------- 7

    CHAPTER 2: BACKGROUND -------------------------------------------------------------------------------------------------------------- 8

    2.1 POWERPC: ------------------------------------------------------------------------------------------------------------------------------ 8

    2.2 POWERPC REGISTERS: ----------------------------------------------------------------------------------------------------------------- 8

    2.2.1 General Purpose Registers: ------------------------------------------------------------------------------------------------ 8

    2.2.2 Exception Register: ----------------------------------------------------------------------------------------------------------- 8

    2.2.3 Count Register: ---------------------------------------------------------------------------------------------------------------- 9

    2.2.4 Condition Register: ----------------------------------------------------------------------------------------------------------- 9

    2.2.5 Link Register: ----------------------------------------------------------------------------------------------------------------- 10

    2.3 POWERPC DATA TYPES: -------------------------------------------------------------------------------------------------------------- 10

    2.4 POWERPC BRANCH INSTRUCTIONS: ------------------------------------------------------------------------------------------------- 10

    2.4.1Addressing Modes: ---------------------------------------------------------------------------------------------------------- 11

    2.5 POWERPC LOAD/STORE INSTRUCTIONS: ------------------------------------------------------------------------------------------- 11

    2.5.1 Addressing Modes: --------------------------------------------------------------------------------------------------------- 11

    2.5.2 Load Instructions: ----------------------------------------------------------------------------------------------------------- 12

    2.5.3 Store Instructions: ---------------------------------------------------------------------------------------------------------- 13

    2.6 POWERPC FIXED POINT INTEGER INSTRUCTIONS: ---------------------------------------------------------------------------------- 13

    2.6.1 Arithmetic Instructions: --------------------------------------------------------------------------------------------------- 13

    2.6.2 Logical Instructions: -------------------------------------------------------------------------------------------------------- 14

    2.6.3 Sign- Extension Instructions: --------------------------------------------------------------------------------------------- 14

    2.6.4 Rotate Instructions: -------------------------------------------------------------------------------------------------------- 14

    2.6.4.2 PowerPC Rotate Instructions: --------------------------------------------------------------------------------------------------- 15

    2.6.5 Shift Instructions: ----------------------------------------------------------------------------------------------------------- 15

    2.6.5.1 Logical Left Shift Instructions: --------------------------------------------------------------------------------------------------- 15

    2.6.5.2 Logical Right Shift Instructions: ------------------------------------------------------------------------------------------------- 16

    2.6.5.3 Algebraic Shift Instructions: ----------------------------------------------------------------------------------------------------- 16

    2.7 PIPELINING OVERVIEW:--------------------------------------------------------------------------------------------------------------- 16

    2.7.1 Pipelining Hazards: --------------------------------------------------------------------------------------------------------- 17

    2.7.1.1 Structural Hazards: ---------------------------------------------------------------------------------------------------------------- 17

    2.7.1.2 Control Hazards: -------------------------------------------------------------------------------------------------------------------- 18

  • School of Electronics and Computer Science

    Canr1g09

    4

    --------------------------------------------------------------------------------------------------------------------------------------------------- 18

    2.7.1.3 Data Hazards: ----------------------------------------------------------------------------------------------------------------------- 18

    CHAPTER 3: DESIGN ---------------------------------------------------------------------------------------------------------------------- 20

    3.1 INITIAL DATAPATH: ------------------------------------------------------------------------------------------------------------------- 20

    3.2 INSTRUCTION SET DESIGN: ----------------------------------------------------------------------------------------------------------- 22

    3.2.1 Fixed Point Integer Instructions: ---------------------------------------------------------------------------------------- 22

    3.2.1.1 Fixed Point Arithmetic Instructions: ------------------------------------------------------------------------------------------- 22

    3.2.1.2 Fixed Point Logical Instructions: ------------------------------------------------------------------------------------------------ 24

    3.2.1.3 Fixed Point Shift Instructions: --------------------------------------------------------------------------------------------------- 26

    3.2.1.4 Fixed Point Rotate Instructions: ------------------------------------------------------------------------------------------------ 28

    3.2.1.5 Fixed Point Compare Instructions: --------------------------------------------------------------------------------------------- 30

    3.2.2 Load/Store Instructions: -------------------------------------------------------------------------------------------------- 32

    3.2.2.1 Load Instructions: ------------------------------------------------------------------------------------------------------------------ 32

    3.2.2.2 Store Instructions: ----------------------------------------------------------------------------------------------------------------- 34

    3.2.3 Branch Instructions: -------------------------------------------------------------------------------------------------------- 36

    3.2.4 Data forwarding and Load-Use: ---------------------------------------------------------------------------------------- 38

    3.2.5 System Call Instruction (sc): ---------------------------------------------------------------------------------------------- 40

    CHAPTER 4: TESTING --------------------------------------------------------------------------------------------------------------------- 41

    4.1 CREATING INSTRUCTIONS: ------------------------------------------------------------------------------------------------------------ 41

    4.2 FIXED POINT INTEGER INSTRUCTIONS: ----------------------------------------------------------------------------------------------- 41

    4.2.1 Fixed Point Arithmetic Instructions: ----------------------------------------------------------------------------------- 43

    4.2.2 Fixed Point Logical Instructions: ---------------------------------------------------------------------------------------- 45

    4.2.3 Fixed Point Shift Instructions: -------------------------------------------------------------------------------------------- 47

    4.2.4 Fixed Point Rotate Instructions: ----------------------------------------------------------------------------------------- 49

    4.2.5 Fixed Point Compare Instructions: ------------------------------------------------------------------------------------- 51

    4.3 LOAD/STORE INSTRUCTIONS: -------------------------------------------------------------------------------------------------------- 53

    4.3.1 Load Instructions: ----------------------------------------------------------------------------------------------------------- 53

    4.3.2 Store Instructions: ---------------------------------------------------------------------------------------------------------- 55

    4.4 BRANCH INSTRUCTIONS: -------------------------------------------------------------------------------------------------------------- 57

    4.5 DATA FORWARDING AND LOAD-USE: ------------------------------------------------------------------------------------------------ 59

    4.6 SIGN-EXTENSION INSTRUCTIONS: ---------------------------------------------------------------------------------------------------- 61

    CHAPTER 5: PROJECT WORK PLAN AND MILESTONES ------------------------------------------------------------------------- 63

    CHAPTER 6: CONCLUSION -------------------------------------------------------------------------------------------------------------- 70

    6.1 ACHIEVEMENTS: ---------------------------------------------------------------------------------------------------------------------- 70

    6.2 LIMITATIONS: -------------------------------------------------------------------------------------------------------------------------- 70

    CHAPTER 7: FUTURE WORK ------------------------------------------------------------------------------------------------------------ 71

  • School of Electronics and Computer Science

    Canr1g09

    5

    FIGURE 1: INDIRECT ADDRESSING MODE SOURCED FROM [3] ............................................................................................ 12

    FIGURE 2: INDIRECT INDEXED ADDRESSING SOURCED FROM [3] .......................................................................................... 12

    FIGURE 3: MASK (MB < ME) SOURCED FROM [4] ........................................................................................................... 15

    FIGURE 4: MASK (MB > ME) SOURCED FROM [4] ........................................................................................................... 15

    FIGURE 5: 5 STAGE PIPELINE SOURCED FROM [1] ............................................................................................................ 17

    FIGURE 6: CONTROL HAZARDS ..................................................................................................................................... 18

    FIGURE 7: DATA FORWARDING (DATA HAZARDS) ............................................................................................................ 18

    FIGURE 8: LOAD-USE .................................................................................................................................................. 19

    FIGURE 9: INITIAL DATAPAT ......................................................................................................................................... 21

    FIGURE 10: FIXED POINT ARITHMETIC INSTRUCTION ........................................................................................................ 23

    FIGURE 11: FIXED POINT LOGICAL INSTRUCTION .............................................................................................................. 25

    FIGURE 12: FIXED POINT SHIFT INSTRUCTION ................................................................................................................. 27

    FIGURE 13: FIXED POINT ROTATE INSTRUCTION .............................................................................................................. 29

    FIGURE 14: FIXED POINT COMPARE INSTRUCTION ........................................................................................................... 31

    FIGURE 15: LOAD INSTRUCTION ................................................................................................................................... 33

    FIGURE 16: STORE INSTRUCTION .................................................................................................................................. 35

    FIGURE 17: BRANCH INSTRUCTIONS .............................................................................................................................. 37

    FIGURE 18: DATA FORWARDING AND LOAD USE ............................................................................................................. 39

    FIGURE 19: DESIGN BROWSER WINDOW ....................................................................................................................... 42

    FIGURE 20: ADDI INSTRUCTION .................................................................................................................................... 44

    FIGURE 21: ORI INSTRUCTION ..................................................................................................................................... 46

    FIGURE 22: SHIFT INSTRUCTION (SRAW) ........................................................................................................................ 48

    FIGURE 23: RLWIMI INSTRUCTION ................................................................................................................................. 50

    FIGURE 24: COMPARE INSTRUCTION ............................................................................................................................. 52

    FIGURE 25: LOAD (LWZ) INSTRUCTION .......................................................................................................................... 54

    FIGURE 26: STORE WITH UPDATE INSTRUCTION ............................................................................................................... 56

    FIGURE 27: BRANCH INSTRUCTIONS .............................................................................................................................. 58

    FIGURE 28: LOAD-USE AND DATA FORWARDING .............................................................................................................. 60

    FIGURE 29: SIGN-EXTENSION INSTRUCTIONS .................................................................................................................. 62

  • School of Electronics and Computer Science

    Canr1g09

    6

    TABLE 1: INITIAL GANTT CHART -------------------------------------------------------------------------------------------------------------- 64

    TABLE 2: FINAL GANTT CHART --------------------------------------------------------------------------------------------------------------- 67

  • School of Electronics and Computer Science

    Canr1g09

    7

    The main aim of the project is to design a 32-bit pipelined PowerPC processor. Verilog HDL

    is used as the hardware description language for writing the modules. The length of an instruction and

    registers are 32 bit long. The modules are simulated and the final results of the simulation are

    analysed. The designed processor runs fixed point integer instructions, branch instructions, integer

    load/store instructions, and sign-extension instructions. The fixed point integer instructions include

    arithmetic, logical, compare, shift, and rotate instructions.

    The chapter 2 covers the background study of the PowerPC processor. The registers and the

    instruction format of PowerPC processor are covered in this chapter. The pipeline approach and

    hazards occurring in the pipelined processor are also included in this chapter.

    The chapter 3 in this report covers the design architecture for different instructions. The

    datapath for fixed point integer instructions, load/store instructions, branch instructions are designed

    and explained in this chapter.

    The chapter 4 in this report covers the testing the design in NC Verilog simulator. The final

    result of datapath design is discussed with their waveform.

    List of implemented instructions shown in Appendix [3] and instructions which are not

    implemented in the design are shown in the Appendix [4] section of the report. The program which is

    used for testing is also shown in the Appendix [5].

  • School of Electronics and Computer Science

    Canr1g09

    8

    2.1 PowerPC:

    PowerPC processor is a 32 bit processor which is capable of doing floating point, fixed point,

    control instructions and also memory management instructions. The fixed point instructions include

    arithmetic, logical, compare, shift and rotate instructions. PowerPC consists of general purpose

    registers and various special purpose registers such as Program counter, also called as Next

    Instruction Pointer (NIP)/ Instruction Address Pointer, Link register, and count register [5]. Some

    PowerPC processors also have 32 (64 or 32 bit) floating point registers. PowerPC is an example of the

    RISC architecture. The RISC architecture in the PowerPC allows [5]:

    All the instructions in the PowerPC processor are fixed 32 bit length Instructions [5].

    In PowerPC, data from the memory is retrieved and stored in registers and then written back

    to the memory. There are some instructions (except load and store instructions) that

    manipulate memory directly [5].

    2.2 PowerPC Registers:

    PowerPC processor has 32 general purpose registers, count register, Link register, Next

    Instruction Pointer, Exception Register, and Condition register.

    2.2.1 General Purpose Registers:

    The general purpose registers are 32 bit long. These registers are used by the fixed point

    integer instructions. The general purpose registers are selected by the 5-bit address in the register field

    in the instruction [4]. Each of the general purpose registers are used to store the result of the

    operations performed by the instruction. All the data manipulation is done in the registers which is

    internal to the processor [7].

    2.2.2 Exception Register:

    The Exception registers is 32 bit in long for the 32 bit processor implementation [4]. The

    Exception register is updated by the results of the arithmetic operations which produce the overflow

    or carry. This register is also used to indicate number of bytes to be transferred by load / store string

    indexed instructions [4], [7]. The bit representation of the exception register is shown in Appendix

    [2].

    The CA field in the exception register XER [2] can be modified by the add-carrying,

    Subtract-from, add-extended, and subtract-from-extended instructions. CA bit is set to 1 whenever the

  • School of Electronics and Computer Science

    Canr1g09

    9

    carry from the arithmetic operations. For the rotate and shift instructions the carry bit is used. Mtspr

    and mcrxr are used to clear the OV bit [4].

    The OV bit of the XER [1] is set by enabling the OE bit in the instruction to 1. Add, Subtract

    and negate instruction sets OV bit, if carry out of the msb is not equal to the carry out of the msb+1.

    Else the OV bit is cleared. If multiply and divide is executed then if the result is not represented in 32

    bit, OV bit is set to 1. Mtspr and mcrxr are used to clear the OV bit [4].

    The SO bit is of the XER[0] set to 1 whenever the instruction sets overflow bit. This bit can

    be cleared by mtspr instruction [4].

    2.2.3 Count Register:

    The count register (CTR) is a 32 bit register which can be used by the branch

    instructions. The contents of count register is used as the branch target address. The bit representation

    of the count register is shown in Appendix [2].

    2.2.4 Condition Register:

    The condition register (CR) is a 32 bit register which reflects the result of the some

    instructions and it is also used for testing and conditional branching. The 32 bit conditional register is

    grouped by eight 4-bit fields, CR0-CR7 [4], [7]. The field specification of the Condition register is

    shown in Appendix [2]. Each of the CR field contains the bit LT, GT, EQ, and SO. These bits are

    updated by results of the compare instruction. The CR0 field is modified by the result of the fixed

    point instructions whenever the Rc field in the instruction enabled. The instructions such as addic.,

    andi., and addis., also modifies the CR0 fields. The bit definitions for the CR0 field are follows [4]:

    LT bit is set to 1, when the result is negative else this bit is cleared.

    GT bit is set to 1, when the result is positive else this bit is cleared.

    EQ bit is set to 1, when the result is equal is zero else cleared.

    SO bit is the copy of the SO bit in the exception register.

    The CR1 field is modified by the floating point instructions. The remaining field of the condition

    register is modified by the compare instructions. The bit definition for the CRn (CR2- CR7) fields is

    [4], [7],

    LT set to 1, when the register, rA is less than immediate value or register, rB. The immediate

    value can be the signed or unsigned.

    GT set to 1, when the register, rA is greater than immediate value or register, rB. The

    immediate value can be signed or unsigned.

    EQ set to 1, when the register, rA and immediate value or register, rB is equal.

    SO bit is the copy of the XER [SO] bit.

  • School of Electronics and Computer Science

    Canr1g09

    10

    2.2.5 Link Register:

    The Link register is 32 bit register that is used by the branch instructions. The field

    specification of the Condition register is shown in Appendix [2]. It is also used for the subroutine

    linkage. There are two ways in which the branch instruction uses the link register [4], [7]. Branch-

    Conditional to Link Register (bclrx) instructions read the branch -target address from the link register

    (LR). If the link register update option (LK) bit is enabled in the branch instructions, the effective

    address of the instruction following the branch instruction is loaded in the link register.

    2.3 PowerPC Data Types:

    The load and store instructions in the PowerPC processor supports 8(byte), 16(halfword),

    32(word), and 64(doubleword) bits. It uses either little-endian or big-endian style [3]. The Unsigned

    byte can be used for logical or integer arithmetic operations. Some of the load/ Store instructions uses

    the unsigned byte to load from the memory or store in the general purpose registers by zero expanding

    on the left to 32 bit length register size [3]. The Signed Halfword is used for the arithmetic

    operations. Some of the load/ store instructions use the signed halfword to load from the memory or to

    store in the 32 bit register by expanding by zero on the left to 32 bit size [3]. The Unsigned word is

    32 bit in length which can be used for logical operations and as an address pointer [3]. The Signed

    word is used to perform arithmetic operations [3]. The Unsigned Doubleword can be used as the

    address pointer [3].

    2.4 PowerPC Branch Instructions:

    There are two types of branches, conditional branch and unconditional branch. Both the

    conditional and unconditional branches alter the program flow sequence in the forward or backward

    using the AA signal [4]. The function of the AA signal is explained in the section 2.4.1. The branch

    target address is also calculated from link register and count register. One of the features of the Link

    register is to store the return address of the branch instructions. The conditional branch instruction

    tests the bit in condition register. If the condition is true, then the Program counter is modified else the

    program flow sequence is not altered [4]. The branch instruction also affects the contents of the count

    register [4]. The count register value is decremented by 1, and then value is tested by the branch

    instruction [4]. The branch instruction uses three types of the addressing, absolute, Indexed and

    relative addressing. The branch instructions which are implemented in the design are shown in the

    Appendix [3] and instruction format is shown in the Appendix [1].

    The unconditional branch instruction modifies the program counter without testing any bit

    [4]. The LI bit in the instruction field is extended to 32 bit by adding two 0-bit in the right and sign

    extending the msb to left. The value of the LI is the branch target address.

  • School of Electronics and Computer Science

    Canr1g09

    11

    2.4.1Addressing Modes:

    Branch instructions uses three addressing modes for calculating the branch target address. The

    three addressing modes are explained in this section. Both the conditional and unconditional branch

    instruction uses the absolute addressing [3]. For the unconditional branch, the effective address of

    the next instruction is calculated by the 24-bit immediate value within the instruction. This immediate

    value is extended to 32 bit length by adding two 0-bits in the right and sign extending the left. For the

    conditional branch, the effective address of the next instruction is calculated by the 16-bit immediate

    value within the instruction [3]. This 16-bit is extended to 32 bit by adding two 0-bit in the right and

    sign extending to the left.

    As like absolute addressing, relative addressing is also used for both conditional and

    unconditional branching. The effective address calculation is same as the absolute addressing. The

    resulting address is added with the current instruction address to produce next instruction address [3].

    Indexed Addressing is used only by the conditional branch instructions. The effective

    address of the next instruction is taken from either link register or count register [3]. In this case, the

    count register is used to hold the address of the branch instruction. This is also can be used to hold the

    count for looping.

    2.5 PowerPC Load/Store Instructions:

    The fixed point integer load and store instructions used to move data from data memory to the

    specified general purpose register and to move data from the general purpose register to the data

    memory. The Load/Store instructions which are implemented in the design are shown in the Appendix

    [3] and instruction format is shown in the Appendix [1].

    2.5.1 Addressing Modes:

    The PowerPC has two addressing modes for the load/store instructions. With register

    indirect addressing mode, instruction includes 16 bit displacement which is added with the base

    register [3].In addition, the effective address is fed back to the base register, updating its current

    contents. The other addressing mode for the load/store instruction is register indirect indexed

    addressing [3]. In this mode, instruction includes base register and an index register both of which

    may be any of the general purpose register. The effective address is calculated by adding the contents

    of the base register and index register. If the update is enabled, then the effective address is loaded to

    the base register. The following figure 1 shows the indirect addressing mode for the load/store

    instructions [3]. The figure 2 shows the indirect indexed addressing for load and store instructions.

  • School of Electronics and Computer Science

    Canr1g09

    12

    Logical Address

    Base Register (GPR)Signed Displacement

    +

    disps

    16

    With update

    To address translation

    Figure 1: Indirect Addressing Mode sourced from [3]

    Logical Address

    Base Register (GPR)Index Register (GPR)

    +

    With update

    To address translation

    Figure 2: Indirect Indexed Addressing sourced from [3]

    The register indirect addressing mode can be represent in the RTL,

    Effective address [base register] + displacement

    The register indirect indexed addressing can be represent in the RTL,

    Effective address [base register] + [index register]

    2.5.2 Load Instructions:

    The fixed point integer load instructions is used to read the data from the data memory and

    stores the data in the any of the general purpose register. The load and zero instructions are used to

  • School of Electronics and Computer Science

    Canr1g09

    13

    read data from the memory and the remaining high order bits are cleared to zero [4]. The load and

    algebraic instructions are used to read the data from the memory and fill the higher order bits to one

    [4]. The load and update instructions are used to load data from the memory. In addition, it updates

    the base register with the memory address.

    2.5.3 Store Instructions:

    The fixed point integer store instructions are used to read the data from the general purpose

    register and store it in the data memory [4].PowerPC supports several types of the store instructions.

    The store and update instructions are used to write data to memory and in addition it updates base

    register with the memory address.

    2.6 PowerPC Fixed Point Integer Instructions:

    The fixed point integer instruction uses general purpose register for its operation and storing

    the result. The source for the operation is obtained either from general purpose register or an

    immediate value. These instructions do not access data memory for their operation. Both signed and

    unsigned integers can be used as the source operands. The condition register and exception register

    are updated [4].

    The PowerPC architecture supports several types of the integer instructions [4],

    Arithmetic Instructions

    Logical Instructions

    Rotate Instructions

    Compare Instructions

    Shift Instructions

    2.6.1 Arithmetic Instructions:

    The Arithmetic instructions perform addition, subtraction, negative, multiplication, and

    division. These instructions use general purpose registers as its source and destination operands. Some

    instructions use immediate value as its source operands. Integer arithmetic instructions support both

    signed and unsigned operations [4]. This carry is stored in the carry bit in the exception register. If the

    record bit (Rc) in the instruction is enabled to 1, then the CR0 field of the condition register is

    updated. If the result of the arithmetic operation is zero, zero bit is set to 1. For the signed operation,

    the negative bit is set to 1 when the MSB is set to 1 [4]. The arithmetic instructions which are

    implemented in the design are shown in the Appendix [3] and instruction format is shown in the

    Appendix [1].

  • School of Electronics and Computer Science

    Canr1g09

    14

    The negation instruction is used to perform the 2s complement of the operand. The source

    operand is 2s complemented and it is stored in the destination register [4]. If the record bit in the

    instruction is enabled, then condition register is updated.

    The multiply instructions are used to perform multiplication between the two 32-bit

    operands and produce 64-bit result. The source operands for the multiplication can be either register

    value or an immediate value. In Multiply Low-Word Instructions and Multiply Low-Word

    Immediate Instructions, the destination register is loaded with the low 2-bit of the product [4]. In

    Multiply High-Word Instructions, the destination register is loaded with the higher 32bit product

    [4]. The exception and condition registers are updated.

    The Divide Instructions are used to perform division. The source and destination operands

    for the division must be from general purpose registers. The quotient is loaded in the destination

    registers. In Divide-Word Instructions, the two 32-bit operands are divided and the low 32-bit of the

    quotient is loaded in the destination register. In Divide-Word Unsigned Instructions, the destination

    register is loaded with low 32-bit quotient. The source operands are interpreted as unsigned integers.

    The exception and condition register is updated [4].

    2.6.2 Logical Instructions:

    The Logical instructions are used to perform the logical operations such as logical OR, logical

    AND, logical NAND, logical NOR and logical XOR. These instructions perform on the 32-bit

    operands. If the operand is an immediate value, this value is extended by either adding zeros in the

    right i.e. immediate shifted or the 16 bit value is extended by the adding the 0-bit in the left i.e.

    unsigned immediate value [4]. The record bit in the instruction is indicated by .. If the record bit

    (Rc) in the instruction field is enabled, the result of the logical instruction updates the condition

    register [4].The logical instructions which are implemented in the design are shown in the Appendix

    [3] and instruction format is shown in the Appendix [1]. The exception register is not updated by the

    logical instructions.

    2.6.3 Sign- Extension Instructions:

    There are two sign extended instructions that supported by the PowerPC. They are extsh and

    extsb. The extsh updates the destination register by reading the lower halfword from the source

    register. The 16th bit is extended to 32 bit and updated in the destination register. Similarly, extsb

    updates the destination register by reading the lower byte from the source register and sign extending

    the 24th bit to32-bit data [4]. The sign-extension instructions which are implemented in the design are

    shown in the Appendix [3] and instruction format is shown in the Appendix [1].

    2.6.4 Rotate Instructions:

    Rotate instructions uses the general purpose registers for source and destination. The data is

    rotated by the left from the LSB to MSB. The data coming out of the MSB is rotated to the LSB of the

  • School of Electronics and Computer Science

    Canr1g09

    15

    data. If the Rc field in the rotate instructions is enabled, the result of the rotate instructions updates the

    condition register field, CR0. For the rotate instructions, the mask should be generated [4]. All the

    rotate instructions are implemented in the design and shown in the Appendix [3] and instruction

    format is shown in the Appendix [1].

    2.6.4.1 Mask Generation:

    The mask is a 32-bit data. The MB and ME are 5-bit field used to generate the 32-bit mask. If

    the value of the MB is less than the value of ME, then the bits in the mask between the MB and ME

    is set to 1. The remaining bits are set to 0. If the value of the ME is less than the value of MB, then

    the bits in the mask between the ME and MB is set to 0. The remaining bits are set to 1. The figure 3

    shows the mask generation if MB < ME [4],

    0 0 0 0 .0 1 1 1 1 1 11 0 0 0 0 0.0

    0 MB ME 31

    Figure 3: Mask (MB < ME) sourced from [4]

    The figure 4 shows the mask generation if ME < MB [4],

    1 1 1 1 1 1 .1 0 0 0 0 0 0 0 0 0 . . . . . . . 0 1 1 1 1 1. . . 1

    0 ME MB 31

    Figure 4: Mask (MB > ME) sourced from [4]

    2.6.4.2 PowerPC Rotate Instructions:

    PowerPC supports three rotate instructions. They are rlwimi, rlwnm, and rlwinm. The

    instruction, rlwimi rotates the source register left by the number of bits specified in the 5-bit SH field

    [4]. Insert the rotated data to the destination register where the bits in the mask are enabled to 1. The

    remaining bits are unchanged in the destination register. The instruction, rlwnm rotates the source

    register, rS left by the number of bit specified in the source register, rB[27:31] [4]. The rotated data is

    AND with the mask and the result is stored in the destination register. The instruction, rlwinm rotates

    the source register left by the number of bits specified by the 5-bit SH field [4]. The rotated data is

    AND with the mask and the result is loaded in the destination register.

    2.6.5 Shift Instructions:

    Shift instructions are used shift the contents of the source register to either left or right. It

    operates on the 32-bit operand [4]. All the shift instructions are implemented in the design and shown

    in the Appendix [3] and instruction format is shown in the Appendix [1].

    2.6.5.1 Logical Left Shift Instructions:

    Three general purpose registers are used for the logical shift left instructions. The logical left

    shift instructions shifts the source register, rS bits from the LSB to the MSB by the number of bits

  • School of Electronics and Computer Science

    Canr1g09

    16

    specified by source register, rB[27:31] [4]. The bit shifted out from the MSB is filled with zero. The

    condition register field, CR0 is updated when the Rc bit is set to 1.

    2.6.5.2 Logical Right Shift Instructions:

    Three general purpose registers are used for the logical shift right instructions. The logical

    right shift instructions shifts the source register, rS bits from the MSB to the LSB by the number of

    bits specified by source register, rB[27:31]. The bit shifted from the MSB to LSB is filled with zero

    [4]. The condition register field, CR0 is updated when the Rc bit is set to 1.

    2.6.5.3 Algebraic Shift Instructions:

    The two instructions sraw and srawi are used in the PowerPC. The instruction, sraw shifts

    the data in the source register, rS right by number of bits specified by the source register, rB[27:31].

    The MSB of the source register, rS is replicated to fill the vacated bit positions on the left. . The bits

    shifted out of the LSB are lost. The result is stored in the destination register. The instruction, srawi

    shifts the data in the source register, rS right by the number of bits specified by 5-bit SH field. The

    MSB of the source register, rS is replicated to fill the vacated bit positions on the left. The bits shifted

    out of the LSB are lost. The result is stored in the destination register [4].

    2.7 Pipelining Overview:

    Pipelining is an implementation technique in which more than one instruction is overlapped in

    execution [1]. The execution of the instruction is fast. The Harvard architecture has five stages of the

    Pipelining. The five stages of pipeline are Instruction Fetch (IF), Instruction Decode (ID), Instruction

    Execute, Memory, and Write back [1].

    Instruction Fetch (IF) stage is used fetch the instruction to be executed from the instruction

    memory.

    Instruction Decode (ID) stage is used read values from the registers. In PowerPC, the

    reading the register values and decoding will occur in the same stage.

    Instruction Execute (IE) stage is used to calculate the data memory address if load/store

    instruction is executed. Otherwise, this stage is used to execute the instruction and calculate

    the result.

    Memory (MEM) stage is used for the load/store instruction which reads/stores the data from

    the registers.

    Write Back (WB) stage is used to store the result into the register.

    The following figure 5 shows the five stages of the pipeline which instructions are executed per clock

    cycle.

  • School of Electronics and Computer Science

    Canr1g09

    17

    In the figure 5, there are 5 instructions to be executed in a sequential order. The instruction 1 is

    fetched by the IF stage and it is sent to the ID stage. While the Instruction 1 is decoded, the second

    instruction is fetched [1]. When the instruction 1 is executed, the instruction 2 is decoded and at a

    same time instruction is fetched and it continues.

    CLK

    Inst 1

    Inst 2

    Inst 3

    Inst 4

    Inst 5

    IF

    IF

    IF

    IF

    IF

    ID

    ID

    ID

    ID

    EX

    EX

    EX

    EX

    ID

    MEM

    MEM

    MEM

    WB

    WB

    Figure 5: 5 Stage Pipeline sourced from [1]

    2.7.1 Pipelining Hazards:

    Pipelining Hazards occur in the pipeline when the next instruction cannot execute in the

    following clock cycles. There are three kinds of hazards. They are structural hazards, control hazards,

    and data hazards [1].

    2.7.1.1 Structural Hazards:

    The structural hazard is the first hazard. The hardware cannot support the combination of

    instructions that to be executed in the same clock cycle [1].If there is two memories, one for

    instructions and another for data, the structural hazard can be avoided [9].

  • School of Electronics and Computer Science

    Canr1g09

    18

    2.7.1.2 Control Hazards:

    The control hazard is the second hazard which arises from the need to make decision based on

    the results of the one instruction while the others are executing. When executing the branch

    instruction, the branch address is calculated either in second or third stage [1]. Once the branch is

    likely to be taken, the instructions in the IF and ID stages should not execute. These two instructions

    have to flush from the pipeline [9] shown in the figure 6..

    Instruction Flushed Branch is taken

    Figure 6: Control Hazards

    2.7.1.3 Data Hazards:

    Data hazards occur when the instruction depends on the previous instruction result. This is

    called data dependency [1]. If the data is not available, the wrong data is fetched and produces the

    incorrect result. There are two possibilities for data hazards and can be avoided using data forwarding

    unit and load-use unit [9].

    2.7.1.3.1Dataforwarding Hazard:

    This is can be avoided by dataforwarding unit. The result in the each of the EX, MEM, WB is

    forwarded to the dataforwarding unit. For example, addi r2, r4, 1010h and xoris r6, r2, 1100h

    In the second instruction, the r2 value is depends on the previous instruction result. This value is

    Figure 7: Data Forwarding (Data Hazards)

    Forward to the dataforwarding unit [9]. Now the r2 is available for the next instruction shown in

    figure 7.

    2.7.1.3.2 Load-Use Hazard:

    For example, lwz r15, 0010h and or r10, r15, r12

    The load-use hazard occur when instruction is depends on previous load instruction result. To avoid

    this hazard, the ID stage is stalled for 1 clock cycle. The result of the load instruction is available in

    IF

    Inst 2

    ID

    Inst 1

    EX

    Inst

    MEM

    IF ID EX MEM WB

    Dataforwarding

  • School of Electronics and Computer Science

    Canr1g09

    19

    the MEM stage. This is forwarded to the dataforwarding unit. The example shown above, ID stage for

    the OR instruction is stalled for 1 clock cycle. At the end of the MEM stage, the r15 is forwarded to

    ID stage [9] shown in figure 8.

    IF ID EX MEM WB

    Data Forwarded from the end of MEM stage to

    Beginning of the EX stage

    Figure 8: Load-use

  • School of Electronics and Computer Science

    Canr1g09

    20

    The 32-bit PowerPC Processor is designed using the verilog description language modules.

    The initial Datapath is designed and expanded accordingly to suitable for the PowerPC instruction set.

    The main aim of the project is to design the Pipelined processor which supports PowerPC instruction

    set. But with the given time limit, the designed processor supports the fixed point integer instructions,

    load/store instructions, and branch instructions. The design is started with implementing the basic

    arithmetic and logical instructions and tested. The design is then expanded by adding the load/store

    instructions, data forwarding and load-use.

    3.1 Initial Datapath:

    The Initial Datapath is shown in the figure 9. The Datapath of the design consists of five

    stages of pipeline. The five stages are IF, ID, EX, MEM, and WB. The pipeline approach and their

    stages are explained in the section 2.7. The hidden lines in the figure 9 shows the five stages of

    pipeline implemented. Pipeline registers in the hidden line is used to store the value. The value in this

    registers can be used as input for the next stage. The processor performs its operation during the

    positive edge clock cycles. The values in these registers are updated during each positive edge clock

    cycle. Two memories are used. One is for Data memory and another is for Instruction memory.

    The multiplexers are used for select the register contents for the different instructions. The

    reg0_add and reg1_add are used as the source for the arithmetic and load/store instructions. The

    regshift_add and reg1_add are used as the source for the logical, shift and rotate instructions. The

    write back register is selected between the reg0_add and regshift_add which depends on the type of

    the instruction set. The data coming out of the MEM_Phase is sent to the register bank to store it in

    the registers which act as WB_Phase. The ALU module calculates the memory address and result of

    each instruction. For calculating the memory address, ALU uses the register contents as the source.

    The data from the MEM_Phase can be either from the result of the ALU or the data from the memory.

    The resister bank sends the data to the data memory for the storing the data. The NIP in the Datapath

    is the Next Instruction Pointer which stores the address of the next instruction to be executed. The IP

    in the Datapath is the Instruction pointer which stores the address of the current instruction being

    executed.

  • School of Electronics and Computer Science

    Canr1g09 21

    NIP

    Instruction

    Memory Register Bank

    Immediate value

    Mem_data

    32

    32

    32

    Data

    Memory

    A

    L

    U

    32

    Mem_data

    Result

    32

    Memory

    address

    Data

    Input

    +

    4

    IF/ID ID/EX EX/MEM MEM/WB

    Wri

    te B

    ack

    Dat

    a3

    2

    Out

    IF_Phase

    ID_Phase EX_Phase

    MEM_Phase WB_Phase

    Inst

    [0:3

    1]

    Data

    IP

    Write Back

    Reg

    5

    Write Back

    Data

    Write Back

    Reg

    5

    Reg1_add

    Reg0_add

    Regshift_add

    Regshift_add

    Reg0_add0

    0

    1

    1

    Branch target Address

    op0

    op1

    Mem_data

    Write Back

    Reg

    Figure 9: Initial Datapat

  • School of Electronics and Computer Science

    Canr1g09 22

    3.2 Instruction Set Design:

    The processor design supports fixed integer instructions, load/store instructions and branch

    instructions. The data hazards and control hazards are encountered in the earlier part of the design and

    they are eliminated by adding the data forwarding unit, Load-use unit and branch prediction unit.

    3.2.1 Fixed Point Integer Instructions:

    3.2.1.1 Fixed Point Arithmetic Instructions:

    The figure 10 shows the design for fixed point arithmetic instructions. The arithmetic

    instructions are fetched from the instruction memory. The instruction is sent to the control module.

    The control module separates the opcode field, registers fields, immediate value, record bit, and

    extended opcode field available in the instruction field. The register field is sent to the register bank

    module to the read the content in the register. The reg0_add and reg1_add are the two operands for the

    arithmetic operation. The final result of the arithmetic operation is written back in the regshift_add

    field. The alu_operands module acts as the decoder which gives the input to the ALU module. There

    are pipeline registers at the end of each phase to store values. The ID_op0 and ID_op1 are two

    operands sent to ALU module as input. The ALU module calculates the arithmetic result and writes in

    the out register. All the addition, subtraction, multiplication, and division instructions are executed in

    this design.

    The wb_reg is the write back register where final result is stored back. At the end of the

    EX_Phase, the result of the arithmetic operation is stored in the EX_out registers. The condition

    register, CR is updated if Rc bit in the instruction is set to 1. The value in the EX_out is passed to the

    MEM_Phase and it is stored in the MEM_wb_data registers. At the end of the MEM_Phase, the data

    in the MEM_wb_data is stored back in the MEM_wb_reg. The simm field indicates the 16-bit

    immediate field is sign extended to 32-bit immediate data. The imm_is (immediate shifted) field

    indicates 16-bit immediate field is extended to 32-bit by concatenating 16 0-bit at right of the 16-bit

    immediate data.

  • School of Electronics and Computer Science

    Canr1g09 23

    Control

    Reg_bank

    Alu_operands

    A

    L

    U

    Inst

    [0:3

    1]

    simm[0:31]

    Imm_is[0:31]

    Reg0_add[0:4]

    Reg1_add[0:4]

    Val

    ue_

    op0[0

    :31]

    Val

    ue_

    op1[0

    :31]

    Regshift_add[0:4]

    Wb_reg[0:4]

    op0[0:31]

    op1[0:31]

    ID_op1[0:31]

    ID_op2[0:31]

    out[0:31]

    ID_Wb_reg[0:4] EX_Wb_reg[0:4]

    ME

    M_W

    b_reg

    [0:4

    ]

    EX_out[0:31]

    ME

    M_w

    b_data[0

    :31]

    Wb_data[0:31]

    Wb_reg[0:4]

    Opcode[0:5]

    ID/EX EX/MEM MEM/WBIF/ID

    ID_Phase EX_Phase MEM_PhaseIF_Phase

    CR0

    FieldCR32

    32

    3232

    32

    32

    32

    32

    32

    5

    5 55

    6

    5

    5

    5

    32

    Figure 10: Fixed Point Arithmetic Instruction

  • School of Electronics and Computer Science

    Canr1g09 24

    3.2.1.2 Fixed Point Logical Instructions:

    The figure 11 shows the processor design which supports the fixed point logical instructions.

    The IF_Phase module fetches the instruction from the instruction memory. The 32-bit instruction is

    sent to the control module. The control module separates the opcode field, register fields, immediate

    value, record bit, and the extended opcode field. The register fields reg1_add, and regshift_add are

    passed to the register bank module. The data in the registers are read and is sent to the alu_operands

    module. The uimm indicates the 32-bit data formed by the concatenating the 16 0-bit in the left of the

    immediate data in the instruction field. The imm_is indicates the immediate shifted which is formed

    by the concatenating the 16 0-bit in the right of the 16-bit immediate data. The immediate data and

    data in the register are passed to the alu_operands module. The alu_operands module decodes the data

    based on the opcode field. This processor design executes logical OR, AND, NAND, XOR, and NEG

    instructions.

    The reg0_add is write back register which stores the result of the logical operation. The

    wb_reg indicates write back register which is same as the register field, reg0_add. At the beginning of

    the EX_Phase, the input operands are sent to the ALU. The ALU performs the logical operation

    between the two operands. The result is stored in the out register. The condition register, CR is

    updated when the Rc bit in the instruction is set. At the end of the EX_Phase, the result of the logical

    operation is stored in the EX_out register. This data is stored in the MEM_wb_data register at the end

    of MEM_Phase. The data in the MEM_wb_data of the MEM_Phase is written back and stored in the

    register.

  • School of Electronics and Computer Science

    Canr1g09 25

    Control

    Reg_bank

    Alu_operands

    A

    L

    U

    Inst

    [0:3

    1]

    uimm[0:31]

    Imm_is[0:31]

    Reg0_add[0:4]

    Reg1_add[0:4]

    Val

    ue_

    rs[0

    :31]

    Val

    ue_

    op1[0

    :31]

    Regshift_add[0:4]

    Wb_reg[0:4]

    op0[0:31]

    op1[0:31]

    ID_op1[0:31]

    ID_op2[0:31]

    out[0:31]

    ID_Wb_reg[0:4] EX_Wb_reg[0:4]

    ME

    M_W

    b_reg

    [0:4

    ]

    EX_out[0:31]

    ME

    M_w

    b_data[0

    :31]

    Wb_data[0:31]

    Wb_reg[0:4]

    Opcode[0:5]

    ID/EX EX/MEM MEM/WBIF/ID

    ID_Phase EX_Phase MEM_PhaseIF_Phase

    CRCR0

    Field

    32

    32

    32

    32

    32

    32

    32

    32

    3232

    32

    6

    5

    5

    5

    Figure 11: Fixed Point Logical Instruction

  • School of Electronics and Computer Science

    Canr1g09 26

    3.2.1.3 Fixed Point Shift Instructions:

    The figure 12 shows the processor design which supports the fixed point shift instructions.

    The instruction is fetched from the instruction memory in the IF_Phase module. The instruction is

    stored in the IF_Phase pipeline register. In the next clock cycle, the instruction is sent to the

    ID_Phase. The control module separates the register fields, Sh field, extended opcode field, and

    record bit (Rc). The register fields are sent to the register bank module to get the 32 bit data from the

    registers. The immediate data field, Sh is separated by the control module and is passed to the

    alu_operands module. The register data and Sh are decoded by the alu_operands module. The

    reg0_add indicates the write back register. At the end of the ID_Phase, the op0 and op1 are stored in

    the pipeline registers. The ALU module shifts the op0 by the number of bits specified by op1 [27:31].

    The shifted data is stored in the out register. At the end of the EX_Phase, the result is stored in the

    EX_out register. This data is passed to the MEM_Phase and data is stored in MEM_wb_data. In the

    next clock cycle, the data is written back and stored in the register. The condition register, CR is

    updated when the Rc bit in the instruction is set. The immediate data in the Sh field represents the

    number the number of bits for shifting the data.

  • School of Electronics and Computer Science

    Canr1g09 27

    Control

    Reg_bank

    Alu_operands

    A

    L

    U

    Inst

    [0:3

    1]

    Sh[0:4]

    Reg0_add[0:4]

    Reg1_add[0:4]

    Val

    ue_

    rs[0

    :31]

    Val

    ue_

    op1[0

    :31]

    Regshift_add[0:4]

    Wb_reg[0:4]

    op0[0:31]

    op1[0:31]

    ID_op1[0:31]

    ID_op2[0:31]

    out[0:31]

    ID_Wb_reg[0:4] EX_Wb_reg[0:4]

    ME

    M_W

    b_reg

    [0:4

    ]

    EX_out[0:31]

    ME

    M_w

    b_data[0

    :31]

    Wb_data[0:31]

    Wb_reg[0:4]

    Opcode[0:5]

    ID/EX EX/MEM MEM/WBIF/ID

    ID_Phase EX_Phase MEM_PhaseIF_Phase

    CR0

    Field

    Extended opcode[0:9]

    CR

    32

    32

    32

    32

    6

    5

    32

    32

    32

    32

    5

    5

    32

    5

    Figure 12: Fixed Point Shift Instruction

  • School of Electronics and Computer Science

    Canr1g09 28

    3.2.1.4 Fixed Point Rotate Instructions:

    The figure 13 shows the processor design that supports the fixed point rotate instructions. The

    instruction is fetched from the instruction memory. This instruction is sent to the control module in

    ID_Phase. The control module separates the register fields, 5-bit ME field, 5-bit MB field,

    extendedopcode field, Sh field, and record bit (Rc). The register field reg1_add, and regshift_add are

    sent to register bank to get the 32-bit data from the register. The register field reg0_add is write back

    register in where the final result of the rotate instruction is stored. The alu_operands module is used as

    the multiplexer that selects either Sh bit or the register data, value_op1 depending on the rotate

    instruction. The wb_reg indicates the write back register which is same as the reg0_add. The

    value_op0, value_op1, and value_rs are taken from the register bank. At the end of the ID_Phase, the

    op0, op1, and wb_reg are stored in the pipeline registers.

    The mask is generated by the signals ID_ME, and ID_MB I the beginning of EX_Phase. If

    the ID_MB bit is less than the ID_MB, the bits in the mask between the MB and ME are filled with 1-

    bit. The remaining bits in the mask are filled with 0-bit. Similarly, if the ID_MB is greater than the

    ID_ME, the bits in the mask between the MB and ME are filled with 0-bit. The remaining bits in the

    mask are filled with 1-bit. The 32-bit mask is generated for the rotate instructions.

    The inputs to this module are the two 32-bit input data, and Sh bit. The Sh bit represents the

    number of bits to be rotated. The ID_op0 is rotated by the number of bits specified by either Sh field

    or ID_op1 [27:31]. Based on the rotate instruction, the mask bit is AND with the rotated data or bits in

    the rotated data is inserted in the write back register data where mask bits are set to 1.

    The rotated data is stored in the out register of EX_Phase module. At the end EX_Phase, the

    result of the rotate instruction is stored in the EX_out register. This data is moved to MEM_Phase and

    written in the MEM_wb_data registers. In the WB_Phase, the data in the MEM_wb_data register is

    written back to the register module and stored in the register. If Rc bit in the instruction is enabled, the

    condition register (CR) is updated with the result of the rotate instruction.

  • School of Electronics and Computer Science

    Canr1g09 29

    Control

    Reg_bank

    Alu_operands

    Inst

    [0:3

    1]

    Reg0_add[0:4]

    Reg1_add[0:4]

    Val

    ue_

    rs[0

    :31]

    Val

    ue_

    op1[0

    :31]

    Regshift_add[0:4]

    Wb_reg[0:4]

    op0[0:31]

    op1[0:31]

    ID_op1[0:31]

    ID_op2[0:31]

    out[0:31]

    ID_Wb_reg[0:4] EX_Wb_reg[0:4]

    ME

    M_W

    b_reg

    [0:4

    ]

    EX_out[0:31]

    ME

    M_w

    b_data[0

    :31]

    Wb_data[0:31]

    Wb_reg[0:4]

    Opcode[0:5]

    ID/EX EX/MEM MEM/WBIF/ID

    ID_Phase EX_Phase MEM_PhaseIF_Phase

    Rotate

    Mask

    Generation

    ME[0:4]MB[0:4]

    ID_MB[0:4]

    ID_ME[0:4]

    Mask

    [0:3

    1]

    Reg1_add[0:4]

    Regshift_add[0:4]

    Sh[0:4]

    Reg0_add[0:4]

    Val

    ue_

    op0[0

    :31]

    Reg0_add[0:4]

    CRCR0

    Field

    32

    32

    32

    32 32

    32

    5

    5

    6

    5

    5

    5

    32

    32

    Figure 13: Fixed Point Rotate Instruction

  • School of Electronics and Computer Science

    Canr1g09 30

    3.2.1.5 Fixed Point Compare Instructions:

    The figure 14 shows the processor design for the fixed point compare instructions. The

    instruction from the instruction memory is fetched and it is stored in the pipeline register in the

    IF_Phase. In the next clock cycle, the instruction is sent to the control module in the ID_Phase. The

    control module separates the opcode field, register fields, extendedopcode field, immediate value, and

    crfD field. The crfD field in the instruction indicates the write back condition register field (CR0

    CR7). The ionstrution format is shown in appendix [1]. The 9th and 10

    th bit in the instruction must be

    0, otherwise the instruction becomes invalid. The register fields reg0_add and reg1_add are sent to

    register bank module. The value_op0 and value_op2 are read from the register fields and passed to

    alu_operands module. The 16-bit immediate field is extended to 32-bit by sign extending to the left

    and is sent to alu_operands module. The alu_operands module decodes the signals based on the

    compare instruction.

    In the EX_Phase, the ID_op0 and ID_op1 are subtracted in the alu module. If the result of the

    subtraction is zero, the EQ bit is set to 1 else it is set to 0. If the ID_op0 is less than the

    ID_op1, lt is set to 1 else gt is set to 1. If any overflow occurs, the summary overflow (SO) is set to 1.

    These bits are stored in the pipeline registers of EX_Phase. In the next clock cycle, the condition

    register is updated. The CRn field in the ID_Phase is modified by the EX_lt, EX_gt, EX_EQ, and

    EX_SO. The CRn field is sent to the condition register fields.

  • School of Electronics and Computer Science

    Canr1g09 31

    Control

    Reg_bank

    Alu_operands

    Inst

    [0:3

    1]

    simm[0:31]

    Reg0_add[0:4]

    Reg1_add[0:4]

    Val

    ue_

    op

    0[0

    :31

    ]

    Val

    ue_

    op

    1[0

    :31

    ]

    Wb_cr[0:2]

    op0[0:31]

    op1[0:31]

    ID_op1[0:31]

    ID_op2[0:31]

    ID_Wb_cr[0:2]

    EX

    _W

    b_

    cr[0

    :2]

    Opcode[0:5]

    ID/EXIF/ID

    ID_Phase EX_PhaseIF_Phase

    CR

    CrfD[0:2]

    CRn

    field

    lt

    gt

    EQ

    SO

    32

    32

    323

    5

    5

    6

    3

    32

    Figure 14: Fixed Point Compare Instruction

  • School of Electronics and Computer Science

    Canr1g09 32

    3.2.2 Load/Store Instructions:

    The processor design supports PowerPC integer load and store instructions. The integer load

    and store instructions support different data types such as byte, halfword, and word. Only load and

    store instructions access the data memory.

    3.2.2.1 Load Instructions:

    The figure 15 shows the processor design which supports load instructions. There are two

    addressing modes for calculating the memory address. The load instruction uses three register fields in

    which two is used for calculating memory address and other is used as the write back register. The

    instruction is fetched from the instruction memory and stored in the register. In the ID_Phase, the

    instruction is passed to the control module which separates the opcode field, register fields,

    displacement, and extendedopcode field. The regshift_add is the write back register in which the data

    from the memory is stored. The register fields reg0_add and reg1_add are sent to register bank

    module and reads the data in the registers. The value_op0 and value_op1 are the base address and

    index address. Based on the instruction, either value_op1 or displacement is selected. The two

    operands are stored in the register at the end of the ID_Phase.

    In the EX_Phase, ID_op0 (base address) and ID_op1 (displacement or index address) are

    added to get the effective address of the memory. If the update signal is enabled, then the base register

    is updated with the effective address. For the load instructions, the ld signal is set to 1. At the end of

    the EX_Phase, the effective address is stored in the pipeline register. The effective is passed to the

    memory in the MEM_Phase. The data in the address is fetched and it is stored in MEM_wb_data

    register. In the WB_Phase, the data in the MEM_wb_data register is written back and stored in the

    register in the register bank module. If the instruction data type is byte, 8 bits are fetched from the

    memory. If the instruction data type is halfword, 16 bit data is fetched from the memory. If the

    instruction data type is word, 32 bit data is fetched from the memory.

  • School of Electronics and Computer Science

    Canr1g09 33

    Control

    Reg_bank

    Alu_operands

    +

    Inst

    [0:3

    1] Disp[0:31]

    Reg0_add[0:4]

    Reg1_add[0:4]

    Val

    ue_

    op0[0

    :31]

    Val

    ue_

    op1[0

    :31]

    Regshift_add[0:4]Wb_reg[0:4]

    op0[0:31]

    op1[0:31]

    ID_op1[0:31]

    ID_op2[0:31]

    EA[0:31]

    ID_Wb_reg[0:4] EX_Wb_reg[0:4]

    ME

    M_W

    b_reg

    [0:4

    ]

    ME

    M_w

    b_data[0

    :31]Wb_data[0:31]

    Wb_reg[0:4]

    Opcode[0:5]

    ID/EX EX/MEM MEM/WBIF/ID

    ID_Phase EX_Phase MEM_PhaseIF_Phase

    Data

    Memory

    Address

    Data out

    EX_EA[0:31]EX_EA[0:31]

    EX

    _E

    A[0

    :31]

    Extendedopcode[0:9]

    Upd_add[0:4]ID_update

    Updat

    e

    ld

    ID_ldld

    32

    32

    32

    3232

    32

    5

    6

    32 32

    32

    5

    10

    5

    5

    Figure 15: Load Instruction

  • School of Electronics and Computer Science

    Canr1g09 34

    3.2.2.2 Store Instructions:

    The figure 16 shows the processor design which supports PowerPC store instructions. The

    memory address can be calculated by adding the base address and displacement or base address and

    index register. The instruction is fetched from the instruction memory in the IF_Phase. In the

    ID_Phase, the instruction is sent to the control module. The control module separates opcode field,

    extendedopcode field, register fields, and displacement. The three register fields are passed to the

    register bank module to get the data value_op0, value_op1, and mem_data. The mem_data represents

    the 32-bit data to be written in the data memory. The value_op0 and value_op1 are 32-bit data which

    are used to calculate the memory address. Based on the type of the instruction, either displacement or

    value_op1 is selected. The op0 and op1 are stored in the pipeline registers at the end of the ID_Phase.

    If the update signal is set to 1, the memory address is updated in the base register for the store

    instruction with update. The st signal shown in the figure 16 is set to 1 if the store instruction is being

    executed. The EX_EA register stores the memory address of the data memory. In the MEM_Phase,

    the mem _data is written in the data memory specified by the address. If the instruction data type is

    byte, 8 bits are written to the memory. If the instruction data type is halfword, 16 bit data is written to

    the memory. If the instruction data type is word, 32 bit data is written to the memory.

  • School of Electronics and Computer Science

    Canr1g09 35

    Control

    Reg_bank

    Alu_operands

    +

    Inst

    [0:3

    1]

    Disp[0:31]

    Reg0_add[0:4]

    Reg1_add[0:4]

    Val

    ue_

    op

    0[0

    :31

    ]

    Val

    ue_

    op

    1[0

    :31

    ]

    Regshift_add[0:4]op0[0:31]

    op1[0:31]

    ID_op1[0:31]

    ID_op2[0:31]

    EA[0:31]

    Opcode[0:5]

    ID/EX EX/MEMIF/ID

    ID_Phase EX_Phase MEM_PhaseIF_Phase

    Data

    Memory

    Address

    Data in

    EX_EA[0:31]

    EX

    _E

    A[0

    :31

    ]

    Extendedopcode[0:9]

    Upd_add[0:4]ID_update

    Up

    dat

    e

    st

    ID_stst

    Regshift_add[0:4]

    Mem_data[0:31] ID_Mem_data[0:31]32

    32

    32

    5

    5

    5

    5

    32

    32

    6

    10

    32

    Figure 16: Store Instruction

  • School of Electronics and Computer Science

    Canr1g09 36

    3.2.3 Branch Instructions:

    The figure 17 shows the processor design for the branch instructions. The instruction is

    fetched from the instruction memory and stored in the pipeline registers in the IF_Phase module. In

    the beginning of the ID_Phase, the control module predicts whether the instruction is branch or not.

    There are two branch instructions, conditional branch (bc) and unconditional branch (b). If the

    instruction is branch unconditional, the control module separates the opcode field, AA bit, LK bit, and

    LI field. The LI field is extended to 32-bit by extending the sign bit. If the instruction is branch

    conditional, the control module separates opcode field, BO field, BI field, AA bit, LK bit, and BD

    field. The BD field is extended by the sign extending the BD to 32-bit. The LI and BD indicate the

    branch target address. The 5-bit BO field indicate the condition to be tested by the branch conditional

    instruction and 5-bit BI field indicate the bit in the condition register.

    The branch instruction tests the condition in the beginning of the EX_Phase. The branch

    target address is also calculated in this phase. The branch_addr signal in the figure 17 indicates the

    branch target address. If the AA signal is 0, the branch address is formed by adding the BD or LI with

    the current IP. If the condition is true, the branch_addr is moved to the IP in the register bank module.

    The program counter value is changed. But the instruction in the IF_Phase and EX_Phase need to be

    removed from the pipeline. Once the condition satisfied, branch_cond signal is set to 1. This signal

    deletes the instruction in the IF_Phase and EX_Phase module.

    If the condition is not satisfied, the instructions in the pipeline are executed in sequential

    order. The program counter is not modified. If the instruction is unconditional branch, the program

    counter is modified with the branch_addr. If the LK signal is enabled, the branch_addr is stored in the

    link register.

  • School of Electronics and Computer Science

    Canr1g09 37

    NIP

    Instruction

    Memory

    +

    4

    Data

    control

    IP

    +

    Branch_

    check

    CR

    0

    Link

    register

    Inst

    [0:3

    1]

    IF/ID ID/EXEX/MEM

    AA LI[0:31]

    BD[0:31]

    LK

    MB[0:5]ME[0:5]

    PC_addr ID_PC_addr

    brtgt_addr

    Bra

    nch

    _ad

    dr[

    0:3

    1]

    Bra

    nch

    _co

    nd32

    32 3232

    32

    1

    5

    5

    1

    32

    32

    1

    0

    Figure 17: Branch Instructions

  • School of Electronics and Computer Science

    Canr1g09 38

    3.2.4 Data forwarding and Load-Use:

    The data forwarding and load-use module are used to avoid data hazards. The figure 18 shows

    the processor design to avoid data hazards. The data forwarding and load-use module is implemented

    in ID_Phase module. The output from the alu module is out signal and it is fed back to as the input to

    the data forwarding module. The output of the EX_Phase module and MEM_Phase module is fed

    back as the input to the data forwarding module. The data forwarding modules compares the whether

    the register field is same as the write back register field. If the register fields are same, the data from

    that phase is forwarded as the operand values. The instruction should be fixed point integer

    instruction. The data forward module forwards only when the register field is same as the write back

    register field in all the phases.

    If the instruction following the load instruction may be fixed point integer instructions where

    the register field is same as the fixed point integer instructions, the data is not available for that

    instruction. The data is available at the end of the MEM_Phase. Because the write back register for

    the load instruction fetch the data from the data memory in the MEM_Phase. At this point the next

    instruction reaches the EX_Phase, fetches the wrong register value instead of the correct value. To get

    the correct value, the instruction has to wait till it gets correct value. The signals in the EX_Phase

    have to wait till the data is available for the execution. The IF_Phase and ID_Phase in the pipeline

    have to stall for 1 clock cycle. So the data will be available at the end of 1 clock cycle.

    The stall_PC in the figure 18 is used to stall the pipeline. If stall_PC is enabled, the pipeline is

    stalled. The op0_sel, op0_sel, and rs_sel are used as the control signal to select the data from the

    MEM_Phase. The load-use module stalls the pipeline by enabling the stall_PC signal. The instruction

    pointer is also stalled for 1 clock cycle. When the pipeline is stalled, the current instruction pointer

    value is stored till the stall_PC clear to 0. For example, if the data for the op1 is not available, op1

    have to wait till the data is available. The other signals in the EX_Phase also have to wait till the data

    is available. Once the data is available, the op1_sel signal selects the data from the memory. The

    EX_Phase will execute the instruction.

  • School of Electronics and Computer Science

    Canr1g09

    39

    Ins_mem

    control

    Register

    bank

    Load-Use

    Data Forward

    Alu_operands

    Data

    Memory

    IP

    NIP

    Regshift_add

    {R

    eg0_ad

    d, re

    g1_ad

    d}

    {file_

    op0, file_

    op1}

    {valu

    e_op0, v

    alu_op1}

    Valu

    e_rs

    op0

    op1

    Wb_reg

    Mem

    _dat

    a

    Op0_sel

    Op1_sel

    rs_sel

    Stall_PC

    out

    EX

    _out

    IP

    NIP

    inst

    A

    L

    U

    EX

    _out

    out ME

    M_w

    b_data

    ME

    M_w

    b_dat

    a

    ID_Wb_reg EX_Wb_reg

    MEM_Wb_reg

    IF/ID ID/EX EX/MEM MEM/WB

    ID_Phase

    IF_Phase

    EX_Phase MEM_Phase

    32

    32

    32

    32

    32

    32

    32

    32

    32

    32

    32

    32

    32 5

    5

    5

    5

    32

    32

    32

    1

    1

    1

    1

    32

    Alu_op1

    Alu_op032

    32

    4+

    0

    1

    0

    1

    0

    1

    Data in

    Address

    Data out

    Wb_regWb_data

    ME

    M_W

    b_re

    g

    Figure 18: Data Forwarding and Load Use

  • School of Electronics and Computer Science

    Canr1g09 40

    3.2.5 System Call Instruction (sc):

    Due to the time limit, the System call (sc) is not implemented in the design. The special

    purpose registers such as MSR, EVPR, SRR0, and SRR1 is not implemented in the design. This

    instruction is used for the system call exception. When the system call exception is occurred, the data

    in the machine state register (MSR) is moved to the SRR1 (store/restore register1). The SRR0

    (store/restore register0) is used to store the next instruction which follows the system call instruction.

    The system call instruction modifies the bit in the MSR.

    The exception vector address (EVA) is moved to the next instruction pointer (NIP). The

    program flow sequence is changed. The EVA is formed by the concatenating the highword in the

    Exception vector prefix register (EVPR) to the left. The MSR contents are modified when the

    instructions are fetched from the NIP.

  • School of Electronics and Computer Science

    Canr1g09 41

    This chapter covers how to create the PowerPC processor instructions, simulating the

    instructions and final result of the each design. The NC Verilog simulator is used to simulate the

    instruction and the output signal waveform is verified and the design is not synthesized.

    4.1 Creating Instructions:

    The instructions are written in the hex code format and stored in the instruction memory

    module of the IF_Phase module. The hex code instruction are fetched from the instruction memory

    and passed to the control module. For example, the hex code for the Instruction andis. r9, r12,FOFO

    is 7589FOFOh. Similarly, all the instructions are converted to hex code based on the format of the

    instruction shown in the Appendix [5]. Due to the time limit, floating point instructions, exceptions,

    interrupt, management instructions, and control instructions are not implemented. The instructions

    which are implemented are shown in the Appendix [3]. The instructions which are not implemented in

    the design are shown in the Appendix [4].

    4.2 Fixed Point Integer Instructions:

    In this section, the fixed point instructions are tested and the final waveforms are discussed.

    The integer instruction covers arithmetic, logical, compare, rotate, and shift instructions. All the

    instructions are run in the NC Verilog simulator. The command used for the simulation is,

    ncv_gui PPC_proceesor_stim.v PPC_processor.v

    The Design browser window will open. Select the signals that affects by the instruction and view in

    the waveform window.

    The PPC_processor is the top level module. It connects the 5 modules of the pipeline such as

    IF_Phase, ID_Phase, EX_Phase, and MEM_Phase. The write back phase is implemented in the

    ID_Phase. Each module takes 1 clock cycle for their execution because it is a sequential block.

    Therefore, each instruction takes totally five clock cycles for their complete execution. The following

    figure 19 shows the design browser window. The PPC_processor_stim is the test bench module. The

    PowerPC is the instance name of the top level module and the IF, ID, EX, MEM are the sub modules

    of the top level module.

  • School of Electronics and Computer Science

    Canr1g09 42

    Figure 19: Design Browser Window

  • School of Electronics and Computer Science

    Canr1g09 43

    4.2.1 Fixed Point Arithmetic Instructions:

    The design of the fixed point instruction is simulated in the NC Verilog simulator. Arithmetic,

    logical, shift, rotate, and compare instructions are simulated. The following figure 20 shows the

    simulated waveform of the addi instruction. The inst in the figure 20 shows the 32 bit instruction

    fetched from the instruction memory. The ID_op0 and ID_op1 are source operands which stores the

    value of the two operands. The out register is used to store the result of the arithmetic instruction. This

    output is stored in the EX_out register at the end of the EX_Phase module. Since the arithmetic

    instructions do not access the memory, the data is stored in the MEM_Phase for 1 clock cycle. The

    MEM_wb_data register stores the data coming from the EX_out register in the MEM_Phase module.

    Then the data is moved back to the register bank module to store the result in the destination register.

    For example, consider addition instruction addi r8,r2,0080h. The hex code of this instruction

    is 39020080h. In the figure 20, the TimeA shows the instruction being fetched. In the next clock

    cycle, the ID_op0 and ID_op1 are the input operand of the instruction. The immediate value in the

    instruction 0080h is separated by the control module. The value in the register r2 is 0 and moved to

    the ID_op0. The immediate value is moved to ID_op1. This is shown in the figure 20. In the

    EX_Phase module i.e 3rd

    clock cycle, the ID_op0 and ID_op1 are added and the result is stored in the

    EX_out register. In the 4th clock cycle, the result of the addition is moved to the MEM_wb_data

    register. The write back register is moved to MEM_wb_reg register. The data in the MEM_wb_data is

    stored in the register, r8. This instruction takes 5 clock cycles for its complete execution. The CR and

    XER are the register which updates the carry and overflow bit.

  • School of Electronics and Computer Science

    Canr1g09 44

    Figure 20: addi Instruction

  • School of Electronics and Computer Science

    Canr1g09 45

    4.2.2 Fixed Point Logical Instructions:

    The figure 21 shows the final result of the simulated logical ori instruction. The inst is the 32

    bit instruction which is the pipeline registers of the IF_Phase module. The ID_op0 and ID_op1 are the

    used to store the two operands for the alu module. The output of the logical instruction is stored in the

    EX_out register at the end of the EX_Phase module. The data is moved to next stage, MEM_Phase

    module and stored in the MEM_wb_data register. Finally, the data is moved to the write back register

    in the register bank module in the ID_Phase module.

    The figure 21 shows the signal waveform of the logical ori instruction. The hex coded value

    of the ori r11,r19,00FFh instruction is 626B00FFh. The timeA in the figure 21 shows the instruction

    being fetched from the memory. The unsigned immediate data is sent as the second operand and

    stored in the ID_op1. The data in the register, r11 is sent to the ID_op0 register in the ID_Phase

    module. In the EX_Phase module, the two operands in the ID_op0 and ID_op1 are sent to the alu

    module. The alu module perform the logical or and stores the result in the EX_out register. This is

    shown in figure 21. The result of the logical ori instruction is sent to the write back registers through

    the MEM_Phase module. It takes 5 clock cycles for its complete execution of the instruction.

  • School of Electronics and Computer Science

    Canr1g09 46

    Figure 21: ORI Instruction

  • School of Electronics and Computer Science

    Canr1g09 47

    4.2.3 Fixed Point Shift Instructions:

    The fixed point shift instructions are tested by simulating the shift instructiong in the

    NC verilog simulator. The fixed point shift instructions are fetched from the instruction memory in

    the IF_Phase module. The ID_op0 and ID_op1 are two pipeline registers used to store the two

    operand value. These data is passed to the alu module. The ID_op1 is shift right by the number bits

    specified by ID_sh. The shifted data is stored in the EX_out register at the end of the EX_Phase

    module. This shifted result is moved to next Phase, MEM_Phase and stored in the MEM_wb_data

    register. The MEM_wb_reg is the write back register where the result of the shift instruction is stored.

    The sraw r17,r8,sh[00111] instruction is simulated in the NC verilog simulator and the necessary

    signals are shown in the waveform in the figure 22. The hex code for this instruction is 7D113E70h.

    The timeA in the figure 22 shows the instruction being fetched from the memory.

    The data, 00000080h in the resister, r8 are stored in the ID_op0 register. The value, 07h of sh

    is stored in the ID_sh register. The ID_op0 is shifted by the number of bits specified by the ID_sh.

    The shifted data, 00000001h is stored in the EX_out register at the end of the EX_Phase module. This

    data is moved to MEM_Phase and stored in MEM_wb_data register and written back in the register,

    r17.

  • School of Electronics and Computer Science

    Canr1g09 48

    Figure 22: Shift Instruction (sraw)

  • School of Electronics and Computer Science

    Canr1g09 49

    4.2.4 Fixed Point Rotate Instructions:

    The operation of the rotate instruction is described in the section 2.6.4. For the rotate

    instructions, the mask has to be generated. The mask is generated at the beginning of the EX_Phase

    module. The rotate module in the EX_phase module is used for executing the rotate instructions. The

    inputs to this module are two operands from the ID_op0 and ID_op1. The number bits to be rotated is

    specified either in ID_op1 or ID_sh. The figure 23 shows the result of rlwimi instruction.

    The instruction is converted to hex code. The hex code of the rlwini r3,r10,sh,mb, me

    instruction is 51434195. The timeA in the figure 23 shows the instruction being fetched from the

    memory. The value of the r10 is 868EFF7Fh and the value of the ID_ME, ID_ME, and ID_sh are 06h,

    0Ah, and 08h. The value of the destination register, r3 is 00000080h and moved to the ID_op1. The

    data in the ID_op1 is rotated by the number of bits specified by the ID_sh (07h). The generation of the

    mask is explained in the section 2.6.4.1. The generated value of the mask is 03C00000h. The rotate

    module in the EX_Phase rotates the ID_op1. The ID_op0 is moved to the out register. The rotated

    data is inserted to the out register where the corresponding bits in the mask register should be 1. If the

    bits in the mask are 0, then the corresponding bit in the rotated data is not inserted. In the figure 23, it

    is shown that wherever the bits in the mask are 1, the roted_out data is inserted in the out register. The

    data in the EX_out is moved to the next phase, MEM_Phase. Finally, the data in the MEM_wb_data

    register is written back to the destination register, r3.

  • School of Electronics and Computer Science

    Canr1g09 50

    Figure 23: rlwimi Instruction

  • School of Electronics and Computer Science

    Canr1g09 51

    4.2.5 Fixed Point