report

University of Southampton

Faculty of Engineering, Science and Mathematics

School of Electronics and Computer Science

Design of a Pipelined PowerPC Processor using Verilog

by

Chidhambaranathan Rajamanikkam(canr1g09)

24 September 2010

A dissertation submitted in partial fulfillment of the degree of

MSc Microelectronics Systems Design

by examination and dissertation

Project Supervisor: B Iain McNally Second Examiner: Dr. Koushik Maharatna


Canr1g09

1

PowerPC processor is designed by IBM. It is widely used in many embedded systems because

of its low power consumption. PowerPC processor is designed using the RISC (Reduced Instruction

Set Computing) instruction set architecture.

This project gives an overview of the implementation of a 32-bit pipelined processor. The

designed pipelined processor is capable of executing PowerPC instructions. The instructions include

PowerPC fixed point integer instructions, branch instructions and integer load/store instructions. The

processor is designed using Verilog description language. These modules are successfully tested using

NC Verilog simulator.

This report gives the information of the instruction set and their forms and architecture of the

PowerPC processor. It also covers the information on the pipelining approach adopted and the data

and control hazards associated with it. The designed processor overcomes both data and control

hazards.


Canr1g09

2

I would like to thank my project supervisor, B Iain McNally for his valuable guidance and

support for doing this project. I would to extend my gratitude to my second examiner Dr. Koushik

Maharatna for his support.


Canr1g09

3

ABSTRACT ------------------------------------------------------------------------------------------------------------------------------------- 1

ACKNOWLEDGEMENT --------------------------------------------------------------------------------------------------------------------- 2

CONTENTS ------------------------------------------------------------------------------------------------------------------------------------- 3

LIST OF FIGURES ----------------------------------------------------------------------------------------------------------------------------- 5

LIST OF TABLES ------------------------------------------------------------------------------------------------------------------------------- 6

CHAPTER 1: INTRODUCTION ------------------------------------------------------------------------------------------------------------- 7

CHAPTER 2: BACKGROUND -------------------------------------------------------------------------------------------------------------- 8

2.1 POWERPC: ------------------------------------------------------------------------------------------------------------------------------ 8

2.2 POWERPC REGISTERS: ----------------------------------------------------------------------------------------------------------------- 8

2.2.1 General Purpose Registers: ------------------------------------------------------------------------------------------------ 8

2.2.2 Exception Register: ----------------------------------------------------------------------------------------------------------- 8

2.2.3 Count Register: ---------------------------------------------------------------------------------------------------------------- 9

2.2.4 Condition Register: ----------------------------------------------------------------------------------------------------------- 9

2.2.5 Link Register: ----------------------------------------------------------------------------------------------------------------- 10

2.3 POWERPC DATA TYPES: -------------------------------------------------------------------------------------------------------------- 10

2.4 POWERPC BRANCH INSTRUCTIONS: ------------------------------------------------------------------------------------------------- 10

2.4.1Addressing Modes: ---------------------------------------------------------------------------------------------------------- 11

2.5 POWERPC LOAD/STORE INSTRUCTIONS: ------------------------------------------------------------------------------------------- 11

2.5.1 Addressing Modes: --------------------------------------------------------------------------------------------------------- 11

2.5.2 Load Instructions: ----------------------------------------------------------------------------------------------------------- 12

2.5.3 Store Instructions: ---------------------------------------------------------------------------------------------------------- 13

2.6 POWERPC FIXED POINT INTEGER INSTRUCTIONS: ---------------------------------------------------------------------------------- 13

2.6.1 Arithmetic Instructions: --------------------------------------------------------------------------------------------------- 13

2.6.2 Logical Instructions: -------------------------------------------------------------------------------------------------------- 14

2.6.3 Sign- Extension Instructions: --------------------------------------------------------------------------------------------- 14

2.6.4 Rotate Instructions: -------------------------------------------------------------------------------------------------------- 14

2.6.4.2 PowerPC Rotate Instructions: --------------------------------------------------------------------------------------------------- 15

2.6.5 Shift Instructions: ----------------------------------------------------------------------------------------------------------- 15

2.6.5.1 Logical Left Shift Instructions: --------------------------------------------------------------------------------------------------- 15

2.6.5.2 Logical Right Shift Instructions: ------------------------------------------------------------------------------------------------- 16

2.6.5.3 Algebraic Shift Instructions: ----------------------------------------------------------------------------------------------------- 16

2.7 PIPELINING OVERVIEW:--------------------------------------------------------------------------------------------------------------- 16

2.7.1 Pipelining Hazards: --------------------------------------------------------------------------------------------------------- 17

2.7.1.1 Structural Hazards: ---------------------------------------------------------------------------------------------------------------- 17

2.7.1.2 Control Hazards: -------------------------------------------------------------------------------------------------------------------- 18


Canr1g09

4

--------------------------------------------------------------------------------------------------------------------------------------------------- 18

2.7.1.3 Data Hazards: ----------------------------------------------------------------------------------------------------------------------- 18

CHAPTER 3: DESIGN ---------------------------------------------------------------------------------------------------------------------- 20

3.1 INITIAL DATAPATH: ------------------------------------------------------------------------------------------------------------------- 20

3.2 INSTRUCTION SET DESIGN: ----------------------------------------------------------------------------------------------------------- 22

3.2.1 Fixed Point Integer Instructions: ---------------------------------------------------------------------------------------- 22

3.2.1.1 Fixed Point Arithmetic Instructions: ------------------------------------------------------------------------------------------- 22

3.2.1.2 Fixed Point Logical Instructions: ------------------------------------------------------------------------------------------------ 24

3.2.1.3 Fixed Point Shift Instructions: --------------------------------------------------------------------------------------------------- 26

3.2.1.4 Fixed Point Rotate Instructions: ------------------------------------------------------------------------------------------------ 28

3.2.1.5 Fixed Point Compare Instructions: --------------------------------------------------------------------------------------------- 30

3.2.2 Load/Store Instructions: -------------------------------------------------------------------------------------------------- 32

3.2.2.1 Load Instructions: ------------------------------------------------------------------------------------------------------------------ 32

3.2.2.2 Store Instructions: ----------------------------------------------------------------------------------------------------------------- 34

3.2.3 Branch Instructions: -------------------------------------------------------------------------------------------------------- 36

3.2.4 Data forwarding and Load-Use: ---------------------------------------------------------------------------------------- 38

3.2.5 System Call Instruction (sc): ---------------------------------------------------------------------------------------------- 40

CHAPTER 4: TESTING --------------------------------------------------------------------------------------------------------------------- 41

4.1 CREATING INSTRUCTIONS: ------------------------------------------------------------------------------------------------------------ 41

4.2 FIXED POINT INTEGER INSTRUCTIONS: ----------------------------------------------------------------------------------------------- 41

4.2.1 Fixed Point Arithmetic Instructions: ----------------------------------------------------------------------------------- 43

4.2.2 Fixed Point Logical Instructions: ---------------------------------------------------------------------------------------- 45

4.2.3 Fixed Point Shift Instructions: -------------------------------------------------------------------------------------------- 47

4.2.4 Fixed Point Rotate Instructions: ----------------------------------------------------------------------------------------- 49

4.2.5 Fixed Point Compare Instructions: ------------------------------------------------------------------------------------- 51

4.3 LOAD/STORE INSTRUCTIONS: -------------------------------------------------------------------------------------------------------- 53

4.3.1 Load Instructions: ----------------------------------------------------------------------------------------------------------- 53

4.3.2 Store Instructions: ---------------------------------------------------------------------------------------------------------- 55

4.4 BRANCH INSTRUCTIONS: -------------------------------------------------------------------------------------------------------------- 57

4.5 DATA FORWARDING AND LOAD-USE: ------------------------------------------------------------------------------------------------ 59

4.6 SIGN-EXTENSION INSTRUCTIONS: ---------------------------------------------------------------------------------------------------- 61

CHAPTER 5: PROJECT WORK PLAN AND MILESTONES ------------------------------------------------------------------------- 63

CHAPTER 6: CONCLUSION -------------------------------------------------------------------------------------------------------------- 70

6.1 ACHIEVEMENTS: ---------------------------------------------------------------------------------------------------------------------- 70

6.2 LIMITATIONS: -------------------------------------------------------------------------------------------------------------------------- 70

CHAPTER 7: FUTURE WORK ------------------------------------------------------------------------------------------------------------ 71


Canr1g09

5

FIGURE 1: INDIRECT ADDRESSING MODE SOURCED FROM [3] ............................................................................................ 12

FIGURE 2: INDIRECT INDEXED ADDRESSING SOURCED FROM [3] .......................................................................................... 12

FIGURE 3: MASK (MB < ME) SOURCED FROM [4] ........................................................................................................... 15

FIGURE 4: MASK (MB > ME) SOURCED FROM [4] ........................................................................................................... 15

FIGURE 5: 5 STAGE PIPELINE SOURCED FROM [1] ............................................................................................................ 17

FIGURE 6: CONTROL HAZARDS ..................................................................................................................................... 18

FIGURE 7: DATA FORWARDING (DATA HAZARDS) ............................................................................................................ 18

FIGURE 8: LOAD-USE .................................................................................................................................................. 19

FIGURE 9: INITIAL DATAPAT ......................................................................................................................................... 21

FIGURE 10: FIXED POINT ARITHMETIC INSTRUCTION ........................................................................................................ 23

FIGURE 11: FIXED POINT LOGICAL INSTRUCTION .............................................................................................................. 25

FIGURE 12: FIXED POINT SHIFT INSTRUCTION ................................................................................................................. 27

FIGURE 13: FIXED POINT ROTATE INSTRUCTION .............................................................................................................. 29

FIGURE 14: FIXED POINT COMPARE INSTRUCTION ........................................................................................................... 31

FIGURE 15: LOAD INSTRUCTION ................................................................................................................................... 33

FIGURE 16: STORE INSTRUCTION .................................................................................................................................. 35

FIGURE 17: BRANCH INSTRUCTIONS .............................................................................................................................. 37

FIGURE 18: DATA FORWARDING AND LOAD USE ............................................................................................................. 39

FIGURE 19: DESIGN BROWSER WINDOW ....................................................................................................................... 42

FIGURE 20: ADDI INSTRUCTION .................................................................................................................................... 44

FIGURE 21: ORI INSTRUCTION ..................................................................................................................................... 46

FIGURE 22: SHIFT INSTRUCTION (SRAW) ........................................................................................................................ 48

FIGURE 23: RLWIMI INSTRUCTION ................................................................................................................................. 50

FIGURE 24: COMPARE INSTRUCTION ............................................................................................................................. 52

FIGURE 25: LOAD (LWZ) INSTRUCTION .......................................................................................................................... 54

FIGURE 26: STORE WITH UPDATE INSTRUCTION ............................................................................................................... 56

FIGURE 27: BRANCH INSTRUCTIONS .............................................................................................................................. 58

FIGURE 28: LOAD-USE AND DATA FORWARDING .............................................................................................................. 60

FIGURE 29: SIGN-EXTENSION INSTRUCTIONS .................................................................................................................. 62


Canr1g09

6

TABLE 1: INITIAL GANTT CHART -------------------------------------------------------------------------------------------------------------- 64

TABLE 2: FINAL GANTT CHART --------------------------------------------------------------------------------------------------------------- 67


Canr1g09

7

The main aim of the project is to design a 32-bit pipelined PowerPC processor. Verilog HDL

is used as the hardware description language for writing the modules. The length of an instruction and

registers are 32 bit long. The modules are simulated and the final results of the simulation are

analysed. The designed processor runs fixed point integer instructions, branch instructions, integer

load/store instructions, and sign-extension instructions. The fixed point integer instructions include

arithmetic, logical, compare, shift, and rotate instructions.

The chapter 2 covers the background study of the PowerPC processor. The registers and the

instruction format of PowerPC processor are covered in this chapter. The pipeline approach and

hazards occurring in the pipelined processor are also included in this chapter.

The chapter 3 in this report covers the design architecture for different instructions. The

datapath for fixed point integer instructions, load/store instructions, branch instructions are designed

and explained in this chapter.

The chapter 4 in this report covers the testing the design in NC Verilog simulator. The final

result of datapath design is discussed with their waveform.

List of implemented instructions shown in Appendix [3] and instructions which are not

implemented in the design are shown in the Appendix [4] section of the report. The program which is

used for testing is also shown in the Appendix [5].


Canr1g09

8

2.1 PowerPC:

PowerPC processor is a 32 bit processor which is capable of doing floating point, fixed point,

control instructions and also memory management instructions. The fixed point instructions include

arithmetic, logical, compare, shift and rotate instructions. PowerPC consists of general purpose

registers and various special purpose registers such as Program counter, also called as Next

Instruction Pointer (NIP)/ Instruction Address Pointer, Link register, and count register [5]. Some

PowerPC processors also have 32 (64 or 32 bit) floating point registers. PowerPC is an example of the

RISC architecture. The RISC architecture in the PowerPC allows [5]:

All the instructions in the PowerPC processor are fixed 32 bit length Instructions [5].

In PowerPC, data from the memory is retrieved and stored in registers and then written back

to the memory. There are some instructions (except load and store instructions) that

manipulate memory directly [5].

2.2 PowerPC Registers:

PowerPC processor has 32 general purpose registers, count register, Link register, Next

Instruction Pointer, Exception Register, and Condition register.

2.2.1 General Purpose Registers:

The general purpose registers are 32 bit long. These registers are used by the fixed point

integer instructions. The general purpose registers are selected by the 5-bit address in the register field

in the instruction [4]. Each of the general purpose registers are used to store the result of the

operations performed by the instruction. All the data manipulation is done in the registers which is

internal to the processor [7].

2.2.2 Exception Register:

The Exception registers is 32 bit in long for the 32 bit processor implementation [4]. The

Exception register is updated by the results of the arithmetic operations which produce the overflow

or carry. This register is also used to indicate number of bytes to be transferred by load / store string

indexed instructions [4], [7]. The bit representation of the exception register is shown in Appendix

[2].

The CA field in the exception register XER [2] can be modified by the add-carrying,

Subtract-from, add-extended, and subtract-from-extended instructions. CA bit is set to 1 whenever the


Canr1g09

9

carry from the arithmetic operations. For the rotate and shift instructions the carry bit is used. Mtspr

and mcrxr are used to clear the OV bit [4].

The OV bit of the XER [1] is set by enabling the OE bit in the instruction to 1. Add, Subtract

and negate instruction sets OV bit, if carry out of the msb is not equal to the carry out of the msb+1.

Else the OV bit is cleared. If multiply and divide is executed then if the result is not represented in 32

bit, OV bit is set to 1. Mtspr and mcrxr are used to clear the OV bit [4].

The SO bit is of the XER[0] set to 1 whenever the instruction sets overflow bit. This bit can

be cleared by mtspr instruction [4].

2.2.3 Count Register:

The count register (CTR) is a 32 bit register which can be used by the branch

instructions. The contents of count register is used as the branch target address. The bit representation

of the count register is shown in Appendix [2].

2.2.4 Condition Register:

The condition register (CR) is a 32 bit register which reflects the result of the some

instructions and it is also used for testing and conditional branching. The 32 bit conditional register is

grouped by eight 4-bit fields, CR0-CR7 [4], [7]. The field specification of the Condition register is

shown in Appendix [2]. Each of the CR field contains the bit LT, GT, EQ, and SO. These bits are

updated by results of the compare instruction. The CR0 field is modified by the result of the fixed

point instructions whenever the Rc field in the instruction enabled. The instructions such as addic.,

andi., and addis., also modifies the CR0 fields. The bit definitions for the CR0 field are follows [4]:

LT bit is set to 1, when the result is negative else this bit is cleared.

GT bit is set to 1, when the result is positive else this bit is cleared.

EQ bit is set to 1, when the result is equal is zero else cleared.

SO bit is the copy of the SO bit in the exception register.

The CR1 field is modified by the floating point instructions. The remaining field of the condition

register is modified by the compare instructions. The bit definition for the CRn (CR2- CR7) fields is

[4], [7],

LT set to 1, when the register, rA is less than immediate value or register, rB. The immediate

value can be the signed or unsigned.

GT set to 1, when the register, rA is greater than immediate value or register, rB. The

immediate value can be signed or unsigned.

EQ set to 1, when the register, rA and immediate value or register, rB is equal.

SO bit is the copy of the XER [SO] bit.


Canr1g09

10

2.2.5 Link Register:

The Link register is 32 bit register that is used by the branch instructions. The field

specification of the Condition register is shown in Appendix [2]. It is also used for the subroutine

linkage. There are two ways in which the branch instruction uses the link register [4], [7]. Branch-

Conditional to Link Register (bclrx) instructions read the branch -target address from the link register

(LR). If the link register update option (LK) bit is enabled in the branch instructions, the effective

address of the instruction following the branch instruction is loaded in the link register.

2.3 PowerPC Data Types:

The load and store instructions in the PowerPC processor supports 8(byte), 16(halfword),

32(word), and 64(doubleword) bits. It uses either little-endian or big-endian style [3]. The Unsigned

byte can be used for logical or integer arithmetic operations. Some of the load/ Store instructions uses

the unsigned byte to load from the memory or store in the general purpose registers by zero expanding

on the left to 32 bit length register size [3]. The Signed Halfword is used for the arithmetic

operations. Some of the load/ store instructions use the signed halfword to load from the memory or to

store in the 32 bit register by expanding by zero on the left to 32 bit size [3]. The Unsigned word is

32 bit in length which can be used for logical operations and as an address pointer [3]. The Signed

word is used to perform arithmetic operations [3]. The Unsigned Doubleword can be used as the

address pointer [3].

2.4 PowerPC Branch Instructions:

There are two types of branches, conditional branch and unconditional branch. Both the

conditional and unconditional branches alter the program flow sequence in the forward or backward

using the AA signal [4]. The function of the AA signal is explained in the section 2.4.1. The branch

target address is also calculated from link register and count register. One of the features of the Link

register is to store the return address of the branch instructions. The conditional branch instruction

tests the bit in condition register. If the condition is true, then the Program counter is modified else the

program flow sequence is not altered [4]. The branch instruction also affects the contents of the count

register [4]. The count register value is decremented by 1, and then value is tested by the branch

instruction [4]. The branch instruction uses three types of the addressing, absolute, Indexed and

relative addressing. The branch instructions which are implemented in the design are shown in the

Appendix [3] and instruction format is shown in the Appendix [1].

The unconditional branch instruction modifies the program counter without testing any bit

[4]. The LI bit in the instruction field is extended to 32 bit by adding two 0-bit in the right and sign

extending the msb to left. The value of the LI is the branch target address.


Canr1g09

11

2.4.1Addressing Modes:

Branch instructions uses three addressing modes for calculating the branch target address. The

three addressing modes are explained in this section. Both the conditional and unconditional branch

instruction uses the absolute addressing [3]. For the unconditional branch, the effective address of

the next instruction is calculated by the 24-bit immediate value within the instruction. This immediate

value is extended to 32 bit length by adding two 0-bits in the right and sign extending the left. For the

conditional branch, the effective address of the next instruction is calculated by the 16-bit immediate

value within the instruction [3]. This 16-bit is extended to 32 bit by adding two 0-bit in the right and

sign extending to the left.

As like absolute addressing, relative addressing is also used for both conditional and

unconditional branching. The effective address calculation is same as the absolute addressing. The

resulting address is added with the current instruction address to produce next instruction address [3].

Indexed Addressing is used only by the conditional branch instructions. The effective

address of the next instruction is taken from either link register or count register [3]. In this case, the

count register is used to hold the address of the branch instruction. This is also can be used to hold the

count for looping.

2.5 PowerPC Load/Store Instructions:

The fixed point integer load and store instructions used to move data from data memory to the

specified general purpose register and to move data from the general purpose register to the data

memory. The Load/Store instructions which are implemented in the design are shown in the Appendix

[3] and instruction format is shown in the Appendix [1].

2.5.1 Addressing Modes:

The PowerPC has two addressing modes for the load/store instructions. With register

indirect addressing mode, instruction includes 16 bit displacement which is added with the base

register [3].In addition, the effective address is fed back to the base register, updating its current

contents. The other addressing mode for the load/store instruction is register indirect indexed

addressing [3]. In this mode, instruction includes base register and an index register both of which

may be any of the general purpose register. The effective address is calculated by adding the contents

of the base register and index register. If the update is enabled, then the effective address is loaded to

the base register. The following figure 1 shows the indirect addressing mode for the load/store

instructions [3]. The figure 2 shows the indirect indexed addressing for load and store instructions.


Canr1g09

12

Logical Address

Base Register (GPR)Signed Displacement

+

disps

16

With update

To address translation

Figure 1: Indirect Addressing Mode sourced from [3]

Logical Address

Base Register (GPR)Index Register (GPR)

+

With update

To address translation

Figure 2: Indirect Indexed Addressing sourced from [3]

The register indirect addressing mode can be represent in the RTL,

Effective address [base register] + displacement

The register indirect indexed addressing can be represent in the RTL,

Effective address [base register] + [index register]

2.5.2 Load Instructions:

The fixed point integer load instructions is used to read the data from the data memory and

stores the data in the any of the general purpose register. The load and zero instructions are used to


Canr1g09

13

read data from the memory and the remaining high order bits are cleared to zero [4]. The load and

algebraic instructions are used to read the data from the memory and fill the higher order bits to one

[4]. The load and update instructions are used to load data from the memory. In addition, it updates

the base register with the memory address.

2.5.3 Store Instructions:

The fixed point integer store instructions are used to read the data from the general purpose

register and store it in the data memory [4].PowerPC supports several types of the store instructions.

The store and update instructions are used to write data to memory and in addition it updates base

register with the memory address.

2.6 PowerPC Fixed Point Integer Instructions:

The fixed point integer instruction uses general purpose register for its operation and storing

the result. The source for the operation is obtained either from general purpose register or an

immediate value. These instructions do not access data memory for their operation. Both signed and

unsigned integers can be used as the source operands. The condition register and exception register

are updated [4].

The PowerPC architecture supports several types of the integer instructions [4],

Arithmetic Instructions

Logical Instructions

Rotate Instructions

Compare Instructions

Shift Instructions

2.6.1 Arithmetic Instructions:

The Arithmetic instructions perform addition, subtraction, negative, multiplication, and

division. These instructions use general purpose registers as its source and destination operands. Some

instructions use immediate value as its source operands. Integer arithmetic instructions support both

signed and unsigned operations [4]. This carry is stored in the carry bit in the exception register. If the

record bit (Rc) in the instruction is enabled to 1, then the CR0 field of the condition register is

updated. If the result of the arithmetic operation is zero, zero bit is set to 1. For the signed operation,

the negative bit is set to 1 when the MSB is set to 1 [4]. The arithmetic instructions which are

implemented in the design are shown in the Appendix [3] and instruction format is shown in the

Appendix [1].


Canr1g09

14

The negation instruction is used to perform the 2s complement of the operand. The source

operand is 2s complemented and it is stored in the destination register [4]. If the record bit in the

instruction is enabled, then condition register is updated.

The multiply instructions are used to perform multiplication between the two 32-bit

operands and produce 64-bit result. The source operands for the multiplication can be either register

value or an immediate value. In Multiply Low-Word Instructions and Multiply Low-Word

Immediate Instructions, the destination register is loaded with the low 2-bit of the product [4]. In

Multiply High-Word Instructions, the destination register is loaded with the higher 32bit product

[4]. The exception and condition registers are updated.

The Divide Instructions are used to perform division. The source and destination operands

for the division must be from general purpose registers. The quotient is loaded in the destination

registers. In Divide-Word Instructions, the two 32-bit operands are divided and the low 32-bit of the

quotient is loaded in the destination register. In Divide-Word Unsigned Instructions, the destination

register is loaded with low 32-bit quotient. The source operands are interpreted as unsigned integers.

The exception and condition register is updated [4].

2.6.2 Logical Instructions:

The Logical instructions are used to perform the logical operations such as logical OR, logical

AND, logical NAND, logical NOR and logical XOR. These instructions perform on the 32-bit

operands. If the operand is an immediate value, this value is extended by either adding zeros in the

right i.e. immediate shifted or the 16 bit value is extended by the adding the 0-bit in the left i.e.

unsigned immediate value [4]. The record bit in the instruction is indicated by .. If the record bit

(Rc) in the instruction field is enabled, the result of the logical instruction updates the condition

register [4].The logical instructions which are implemented in the design are shown in the Appendix

[3] and instruction format is shown in the Appendix [1]. The exception register is not updated by the

logical instructions.

2.6.3 Sign- Extension Instructions:

There are two sign extended instructions that supported by the PowerPC. They are extsh and

extsb. The extsh updates the destination register by reading the lower halfword from the source

register. The 16th bit is extended to 32 bit and updated in the destination register. Similarly, extsb

updates the destination register by reading the lower byte from the source register and sign extending

the 24th bit to32-bit data [4]. The sign-extension instructions which are implemented in the design are

shown in the Appendix [3] and instruction format is shown in the Appendix [1].

2.6.4 Rotate Instructions:

Rotate instructions uses the general purpose registers for source and destination. The data is

rotated by the left from the LSB to MSB. The data coming out of the MSB is rotated to the LSB of the


Canr1g09

15

data. If the Rc field in the rotate instructions is enabled, the result of the rotate instructions updates the

condition register field, CR0. For the rotate instructions, the mask should be generated [4]. All the

rotate instructions are implemented in the design and shown in the Appendix [3] and instruction

format is shown in the Appendix [1].

2.6.4.1 Mask Generation:

The mask is a 32-bit data. The MB and ME are 5-bit field used to generate the 32-bit mask. If

the value of the MB is less than the value of ME, then the bits in the mask between the MB and ME

is set to 1. The remaining bits are set to 0. If the value of the ME is less than the value of MB, then

the bits in the mask between the ME and MB is set to 0. The remaining bits are set to 1. The figure 3

shows the mask generation if MB < ME [4],

0 0 0 0 .0 1 1 1 1 1 11 0 0 0 0 0.0

0 MB ME 31

Figure 3: Mask (MB < ME) sourced from [4]

The figure 4 shows the mask generation if ME < MB [4],

1 1 1 1 1 1 .1 0 0 0 0 0 0 0 0 0 . . . . . . . 0 1 1 1 1 1. . . 1

0 ME MB 31

Figure 4: Mask (MB > ME) sourced from [4]

2.6.4.2 PowerPC Rotate Instructions:

PowerPC supports three rotate instructions. They are rlwimi, rlwnm, and rlwinm. The

instruction, rlwimi rotates the source register left by the number of bits specified in the 5-bit SH field

[4]. Insert the rotated data to the destination register where the bits in the mask are enabled to 1. The

remaining bits are unchanged in the destination register. The instruction, rlwnm rotates the source

register, rS left by the number of bit specified in the source register, rB[27:31] [4]. The rotated data is

AND with the mask and the result is stored in the destination register. The instruction, rlwinm rotates

the source register left by the number of bits specified by the 5-bit SH field [4]. The rotated data is

AND with the mask and the result is loaded in the destination register.

2.6.5 Shift Instructions:

Shift instructions are used shift the contents of the source register to either left or right. It

operates on the 32-bit operand [4]. All the shift instructions are implemented in the design and shown

in the Appendix [3] and instruction format is shown in the Appendix [1].

2.6.5.1 Logical Left Shift Instructions:

Three general purpose registers are used for the logical shift left instructions. The logical left

shift instructions shifts the source register, rS bits from the LSB to the MSB by the number of bits


Canr1g09

16

specified by source register, rB[27:31] [4]. The bit shifted out from the MSB is filled with zero. The

condition register field, CR0 is updated when the Rc bit is set to 1.

2.6.5.2 Logical Right Shift Instructions:

Three general purpose registers are used for the logical shift right instructions. The logical

right shift instructions shifts the source register, rS bits from the MSB to the LSB by the number of

bits specified by source register, rB[27:31]. The bit shifted from the MSB to LSB is filled with zero

[4]. The condition register field, CR0 is updated when the Rc bit is set to 1.

2.6.5.3 Algebraic Shift Instructions:

The two instructions sraw and srawi are used in the PowerPC. The instruction, sraw shifts

the data in the source register, rS right by number of bits specified by the source register, rB[27:31].

The MSB of the source register, rS is replicated to fill the vacated bit positions on the left. . The bits

shifted out of the LSB are lost. The result is stored in the destination register. The instruction, srawi

shifts the data in the source register, rS right by the number of bits specified by 5-bit SH field. The

MSB of the source register, rS is replicated to fill the vacated bit positions on the left. The bits shifted

out of the LSB are lost. The result is stored in the destination register [4].

2.7 Pipelining Overview:

Pipelining is an implementation technique in which more than one instruction is overlapped in

execution [1]. The execution of the instruction is fast. The Harvard architecture has five stages of the

Pipelining. The five stages of pipeline are Instruction Fetch (IF), Instruction Decode (ID), Instruction

Execute, Memory, and Write back [1].

Instruction Fetch (IF) stage is used fetch the instruction to be executed from the instruction

memory.

Instruction Decode (ID) stage is used read values from the registers. In PowerPC, the

reading the register values and decoding will occur in the same stage.

Instruction Execute (IE) stage is used to calculate the data memory address if load/store

instruction is executed. Otherwise, this stage is used to execute the instruction and calculate

the result.

Memory (MEM) stage is used for the load/store instruction which reads/stores the data from

the registers.

Write Back (WB) stage is used to store the result into the register.

The following figure 5 shows the five stages of the pipeline which instructions are executed per clock

cycle.


Canr1g09

17

In the figure 5, there are 5 instructions to be executed in a sequential order. The instruction 1 is

fetched by the IF stage and it is sent to the ID stage. While the Instruction 1 is decoded, the second

instruction is fetched [1]. When the instruction 1 is executed, the instruction 2 is decoded and at a

same time instruction is fetched and it continues.

CLK

Inst 1

Inst 2

Inst 3

Inst 4

Inst 5

IF

IF

IF

IF

IF

ID

ID

ID

ID

EX

EX

EX

EX

ID

MEM

MEM

MEM

WB

WB

Figure 5: 5 Stage Pipeline sourced from [1]

2.7.1 Pipelining Hazards:

Pipelining Hazards occur in the pipeline when the next instruction cannot execute in the

following clock cycles. There are three kinds of hazards. They are structural hazards, control hazards,

and data hazards [1].

2.7.1.1 Structural Hazards:

The structural hazard is the first hazard. The hardware cannot support the combination of

instructions that to be executed in the same clock cycle [1].If there is two memories, one for

instructions and another for data, the structural hazard can be avoided [9].


Canr1g09

18

2.7.1.2 Control Hazards:

The control hazard is the second hazard which arises from the need to make decision based on

the results of the one instruction while the others are executing. When executing the branch

instruction, the branch address is calculated either in second or third stage [1]. Once the branch is

likely to be taken, the instructions in the IF and ID stages should not execute. These two instructions

have to flush from the pipeline [9] shown in the figure 6..

Instruction Flushed Branch is taken

Figure 6: Control Hazards

2.7.1.3 Data Hazards:

Data hazards occur when the instruction depends on the previous instruction result. This is

called data dependency [1]. If the data is not available, the wrong data is fetched and produces the

incorrect result. There are two possibilities for data hazards and can be avoided using data forwarding

unit and load-use unit [9].

2.7.1.3.1Dataforwarding Hazard:

This is can be avoided by dataforwarding unit. The result in the each of the EX, MEM, WB is

forwarded to the dataforwarding unit. For example, addi r2, r4, 1010h and xoris r6, r2, 1100h

In the second instruction, the r2 value is depends on the previous instruction result. This value is

Figure 7: Data Forwarding (Data Hazards)

Forward to the dataforwarding unit [9]. Now the r2 is available for the next instruction shown in

figure 7.

2.7.1.3.2 Load-Use Hazard:

For example, lwz r15, 0010h and or r10, r15, r12

The load-use hazard occur when instruction is depends on previous load instruction result. To avoid

this hazard, the ID stage is stalled for 1 clock cycle. The result of the load instruction is available in

IF

Inst 2

ID

Inst 1

EX

Inst

MEM

IF ID EX MEM WB

Dataforwarding


Canr1g09

19

the MEM stage. This is forwarded to the dataforwarding unit. The example shown above, ID stage for

the OR instruction is stalled for 1 clock cycle. At the end of the MEM stage, the r15 is forwarded to

ID stage [9] shown in figure 8.

IF ID EX MEM WB

Data Forwarded from the end of MEM stage to

Beginning of the EX stage

Figure 8: Load-use


Canr1g09

20

The 32-bit PowerPC Processor is designed using the verilog description language modules.

The initial Datapath is designed and expanded accordingly to suitable for the PowerPC instruction set.

The main aim of the project is to design the Pipelined processor which supports PowerPC instruction

set. But with the given time limit, the designed processor supports the fixed point integer instructions,

load/store instructions, and branch instructions. The design is started with implementing the basic

arithmetic and logical instructions and tested. The design is then expanded by adding the load/store

instructions, data forwarding and load-use.

3.1 Initial Datapath:

The Initial Datapath is shown in the figure 9. The Datapath of the design consists of five

stages of pipeline. The five stages are IF, ID, EX, MEM, and WB. The pipeline approach and their

stages are explained in the section 2.7. The hidden lines in the figure 9 shows the five stages of

pipeline implemented. Pipeline registers in the hidden line is used to store the value. The value in this

registers can be used as input for the next stage. The processor performs its operation during the

positive edge clock cycles. The values in these registers are updated during each positive edge clock

cycle. Two memories are used. One is for Data memory and another is for Instruction memory.

The multiplexers are used for select the register contents for the different instructions. The

reg0_add and reg1_add are used as the source for the arithmetic and load/store instructions. The

regshift_add and reg1_add are used as the source for the logical, shift and rotate instructions. The

write back register is selected between the reg0_add and regshift_add which depends on the type of

the instruction set. The data coming out of the MEM_Phase is sent to the register bank to store it in

the registers which act as WB_Phase. The ALU module calculates the memory address and result of

each instruction. For calculating the memory address, ALU uses the register contents as the source.

The data from the MEM_Phase can be either from the result of the ALU or the data from the memory.

The resister bank sends the data to the data memory for the storing the data. The NIP in the Datapath

is the Next Instruction Pointer which stores the address of the next instruction to be executed. The IP

in the Datapath is the Instruction pointer which stores the address of the current instruction being

executed.


Canr1g09 21

NIP

Instruction

Memory Register Bank

Immediate value

Mem_data

32

32

32

Data

Memory

A

L

U

32

Mem_data

Result

32

Memory

address

Data

Input

+

4

IF/ID ID/EX EX/MEM MEM/WB

Wri

te B

ack

Dat

a3

2

Out

IF_Phase

ID_Phase EX_Phase

MEM_Phase WB_Phase

Inst

[0:3

1]

Data

IP

Write Back

Reg

5

Write Back

Data

Write Back

Reg

5

Reg1_add

Reg0_add

Regshift_add

Regshift_add

Reg0_add0

0

1

1

Branch target Address

op0

op1

Mem_data

Write Back

Reg

Figure 9: Initial Datapat


Canr1g09 22

3.2 Instruction Set Design:

The processor design supports fixed integer instructions, load/store instructions and branch

instructions. The data hazards and control hazards are encountered in the earlier part of the design and

they are eliminated by adding the data forwarding unit, Load-use unit and branch prediction unit.

3.2.1 Fixed Point Integer Instructions:

3.2.1.1 Fixed Point Arithmetic Instructions:

The figure 10 shows the design for fixed point arithmetic instructions. The arithmetic

instructions are fetched from the instruction memory. The instruction is sent to the control module.

The control module separates the opcode field, registers fields, immediate value, record bit, and

extended opcode field available in the instruction field. The register field is sent to the register bank

module to the read the content in the register. The reg0_add and reg1_add are the two operands for the

arithmetic operation. The final result of the arithmetic operation is written back in the regshift_add

field. The alu_operands module acts as the decoder which gives the input to the ALU module. There

are pipeline registers at the end of each phase to store values. The ID_op0 and ID_op1 are two

operands sent to ALU module as input. The ALU module calculates the arithmetic result and writes in

the out register. All the addition, subtraction, multiplication, and division instructions are executed in

this design.

The wb_reg is the write back register where final result is stored back. At the end of the

EX_Phase, the result of the arithmetic operation is stored in the EX_out registers. The condition

register, CR is updated if Rc bit in the instruction is set to 1. The value in the EX_out is passed to the

MEM_Phase and it is stored in the MEM_wb_data registers. At the end of the MEM_Phase, the data

in the MEM_wb_data is stored back in the MEM_wb_reg. The simm field indicates the 16-bit

immediate field is sign extended to 32-bit immediate data. The imm_is (immediate shifted) field

indicates 16-bit immediate field is extended to 32-bit by concatenating 16 0-bit at right of the 16-bit

immediate data.


Canr1g09 23

Control

Reg_bank

Alu_operands

A

L

U

Inst

[0:3

1]

simm[0:31]

Imm_is[0:31]

Reg0_add[0:4]

Reg1_add[0:4]

Val

ue_

op0[0

:31]

Val

ue_

op1[0

:31]

Regshift_add[0:4]

Wb_reg[0:4]

op0[0:31]

op1[0:31]

ID_op1[0:31]

ID_op2[0:31]

out[0:31]

ID_Wb_reg[0:4] EX_Wb_reg[0:4]

ME

M_W

b_reg

[0:4

]

EX_out[0:31]

ME

M_w

b_data[0

:31]

Wb_data[0:31]

Wb_reg[0:4]

Opcode[0:5]

ID/EX EX/MEM MEM/WBIF/ID

ID_Phase EX_Phase MEM_PhaseIF_Phase

CR0

FieldCR32

32

3232

32

32

32

32

32

5

5 55

6

5

5

5

32

Figure 10: Fixed Point Arithmetic Instruction


Canr1g09 24

3.2.1.2 Fixed Point Logical Instructions:

The figure 11 shows the processor design which supports the fixed point logical instructions.

The IF_Phase module fetches the instruction from the instruction memory. The 32-bit instruction is

sent to the control module. The control module separates the opcode field, register fields, immediate

value, record bit, and the extended opcode field. The register fields reg1_add, and regshift_add are

passed to the register bank module. The data in the registers are read and is sent to the alu_operands

module. The uimm indicates the 32-bit data formed by the concatenating the 16 0-bit in the left of the

immediate data in the instruction field. The imm_is indicates the immediate shifted which is formed

by the concatenating the 16 0-bit in the right of the 16-bit immediate data. The immediate data and

data in the register are passed to the alu_operands module. The alu_operands module decodes the data

based on the opcode field. This processor design executes logical OR, AND, NAND, XOR, and NEG

instructions.

The reg0_add is write back register which stores the result of the logical operation. The

wb_reg indicates write back register which is same as the register field, reg0_add. At the beginning of

the EX_Phase, the input operands are sent to the ALU. The ALU performs the logical operation

between the two operands. The result is stored in the out register. The condition register, CR is

updated when the Rc bit in the instruction is set. At the end of the EX_Phase, the result of the logical

operation is stored in the EX_out register. This data is stored in the MEM_wb_data register at the end

of MEM_Phase. The data in the MEM_wb_data of the MEM_Phase is written back and stored in the

register.


Canr1g09 25

Control

Reg_bank

Alu_operands

A

L

U

Inst

[0:3

1]

uimm[0:31]

Imm_is[0:31]

Reg0_add[0:4]

Reg1_add[0:4]

Val

ue_

rs[0

:31]

Val

ue_

op1[0

:31]

Regshift_add[0:4]

Wb_reg[0:4]

op0[0:31]

op1[0:31]

ID_op1[0:31]

ID_op2[0:31]

out[0:31]


ME

M_W

b_reg

[0:4

]

EX_out[0:31]

ME

M_w

b_data[0

:31]

Wb_data[0:31]

Wb_reg[0:4]

Opcode[0:5]



CRCR0

Field

32

32

32

32

32

32

32

32

3232

32

6

5

5

5

Figure 11: Fixed Point Logical Instruction


Canr1g09 26

3.2.1.3 Fixed Point Shift Instructions:

The figure 12 shows the processor design which supports the fixed point shift instructions.

The instruction is fetched from the instruction memory in the IF_Phase module. The instruction is

stored in the IF_Phase pipeline register. In the next clock cycle, the instruction is sent to the

ID_Phase. The control module separates the register fields, Sh field, extended opcode field, and

record bit (Rc). The register fields are sent to the register bank module to get the 32 bit data from the

registers. The immediate data field, Sh is separated by the control module and is passed to the

alu_operands module. The register data and Sh are decoded by the alu_operands module. The

reg0_add indicates the write back register. At the end of the ID_Phase, the op0 and op1 are stored in

the pipeline registers. The ALU module shifts the op0 by the number of bits specified by op1 [27:31].

The shifted data is stored in the out register. At the end of the EX_Phase, the result is stored in the

EX_out register. This data is passed to the MEM_Phase and data is stored in MEM_wb_data. In the

next clock cycle, the data is written back and stored in the register. The condition register, CR is

updated when the Rc bit in the instruction is set. The immediate data in the Sh field represents the

number the number of bits for shifting the data.


Canr1g09 27

Control

Reg_bank

Alu_operands

A

L

U

Inst

[0:3

1]

Sh[0:4]

Reg0_add[0:4]

Reg1_add[0:4]

Val

ue_

rs[0

:31]

Val

ue_

op1[0

:31]

Regshift_add[0:4]

Wb_reg[0:4]

op0[0:31]

op1[0:31]

ID_op1[0:31]

ID_op2[0:31]

out[0:31]


ME

M_W

b_reg

[0:4

]

EX_out[0:31]

ME

M_w

b_data[0

:31]

Wb_data[0:31]

Wb_reg[0:4]

Opcode[0:5]



CR0

Field

Extended opcode[0:9]

CR

32

32

32

32

6

5

32

32

32

32

5

5

32

5

Figure 12: Fixed Point Shift Instruction


Canr1g09 28

3.2.1.4 Fixed Point Rotate Instructions:

The figure 13 shows the processor design that supports the fixed point rotate instructions. The

instruction is fetched from the instruction memory. This instruction is sent to the control module in

ID_Phase. The control module separates the register fields, 5-bit ME field, 5-bit MB field,

extendedopcode field, Sh field, and record bit (Rc). The register field reg1_add, and regshift_add are

sent to register bank to get the 32-bit data from the register. The register field reg0_add is write back

register in where the final result of the rotate instruction is stored. The alu_operands module is used as

the multiplexer that selects either Sh bit or the register data, value_op1 depending on the rotate

instruction. The wb_reg indicates the write back register which is same as the reg0_add. The

value_op0, value_op1, and value_rs are taken from the register bank. At the end of the ID_Phase, the

op0, op1, and wb_reg are stored in the pipeline registers.

The mask is generated by the signals ID_ME, and ID_MB I the beginning of EX_Phase. If

the ID_MB bit is less than the ID_MB, the bits in the mask between the MB and ME are filled with 1-

bit. The remaining bits in the mask are filled with 0-bit. Similarly, if the ID_MB is greater than the

ID_ME, the bits in the mask between the MB and ME are filled with 0-bit. The remaining bits in the

mask are filled with 1-bit. The 32-bit mask is generated for the rotate instructions.

The inputs to this module are the two 32-bit input data, and Sh bit. The Sh bit represents the

number of bits to be rotated. The ID_op0 is rotated by the number of bits specified by either Sh field

or ID_op1 [27:31]. Based on the rotate instruction, the mask bit is AND with the rotated data or bits in

the rotated data is inserted in the write back register data where mask bits are set to 1.

The rotated data is stored in the out register of EX_Phase module. At the end EX_Phase, the

result of the rotate instruction is stored in the EX_out register. This data is moved to MEM_Phase and

written in the MEM_wb_data registers. In the WB_Phase, the data in the MEM_wb_data register is

written back to the register module and stored in the register. If Rc bit in the instruction is enabled, the

condition register (CR) is updated with the result of the rotate instruction.


Canr1g09 29

Control

Reg_bank

Alu_operands

Inst

[0:3

1]

Reg0_add[0:4]

Reg1_add[0:4]

Val

ue_

rs[0

:31]

Val

ue_

op1[0

:31]

Regshift_add[0:4]

Wb_reg[0:4]

op0[0:31]

op1[0:31]

ID_op1[0:31]

ID_op2[0:31]

out[0:31]


ME

M_W

b_reg

[0:4

]

EX_out[0:31]

ME

M_w

b_data[0

:31]

Wb_data[0:31]

Wb_reg[0:4]

Opcode[0:5]



Rotate

Mask

Generation

ME[0:4]MB[0:4]

ID_MB[0:4]

ID_ME[0:4]

Mask

[0:3

1]

Reg1_add[0:4]

Regshift_add[0:4]

Sh[0:4]

Reg0_add[0:4]

Val

ue_

op0[0

:31]

Reg0_add[0:4]

CRCR0

Field

32

32

32

32 32

32

5

5

6

5

5

5

32

32

Figure 13: Fixed Point Rotate Instruction


Canr1g09 30

3.2.1.5 Fixed Point Compare Instructions:

The figure 14 shows the processor design for the fixed point compare instructions. The

instruction from the instruction memory is fetched and it is stored in the pipeline register in the

IF_Phase. In the next clock cycle, the instruction is sent to the control module in the ID_Phase. The

control module separates the opcode field, register fields, extendedopcode field, immediate value, and

crfD field. The crfD field in the instruction indicates the write back condition register field (CR0

CR7). The ionstrution format is shown in appendix [1]. The 9th and 10

th bit in the instruction must be

0, otherwise the instruction becomes invalid. The register fields reg0_add and reg1_add are sent to

register bank module. The value_op0 and value_op2 are read from the register fields and passed to

alu_operands module. The 16-bit immediate field is extended to 32-bit by sign extending to the left

and is sent to alu_operands module. The alu_operands module decodes the signals based on the

compare instruction.

In the EX_Phase, the ID_op0 and ID_op1 are subtracted in the alu module. If the result of the

subtraction is zero, the EQ bit is set to 1 else it is set to 0. If the ID_op0 is less than the

ID_op1, lt is set to 1 else gt is set to 1. If any overflow occurs, the summary overflow (SO) is set to 1.

These bits are stored in the pipeline registers of EX_Phase. In the next clock cycle, the condition

register is updated. The CRn field in the ID_Phase is modified by the EX_lt, EX_gt, EX_EQ, and

EX_SO. The CRn field is sent to the condition register fields.


Canr1g09 31

Control

Reg_bank

Alu_operands

Inst

[0:3

1]

simm[0:31]

Reg0_add[0:4]

Reg1_add[0:4]

Val

ue_

op

0[0

:31

]

Val

ue_

op

1[0

:31

]

Wb_cr[0:2]

op0[0:31]

op1[0:31]

ID_op1[0:31]

ID_op2[0:31]

ID_Wb_cr[0:2]

EX

_W

b_

cr[0

:2]

Opcode[0:5]

ID/EXIF/ID

ID_Phase EX_PhaseIF_Phase

CR

CrfD[0:2]

CRn

field

lt

gt

EQ

SO

32

32

323

5

5

6

3

32

Figure 14: Fixed Point Compare Instruction


Canr1g09 32

3.2.2 Load/Store Instructions:

The processor design supports PowerPC integer load and store instructions. The integer load

and store instructions support different data types such as byte, halfword, and word. Only load and

store instructions access the data memory.

3.2.2.1 Load Instructions:

The figure 15 shows the processor design which supports load instructions. There are two

addressing modes for calculating the memory address. The load instruction uses three register fields in

which two is used for calculating memory address and other is used as the write back register. The

instruction is fetched from the instruction memory and stored in the register. In the ID_Phase, the

instruction is passed to the control module which separates the opcode field, register fields,

displacement, and extendedopcode field. The regshift_add is the write back register in which the data

from the memory is stored. The register fields reg0_add and reg1_add are sent to register bank

module and reads the data in the registers. The value_op0 and value_op1 are the base address and

index address. Based on the instruction, either value_op1 or displacement is selected. The two

operands are stored in the register at the end of the ID_Phase.

In the EX_Phase, ID_op0 (base address) and ID_op1 (displacement or index address) are

added to get the effective address of the memory. If the update signal is enabled, then the base register

is updated with the effective address. For the load instructions, the ld signal is set to 1. At the end of

the EX_Phase, the effective address is stored in the pipeline register. The effective is passed to the

memory in the MEM_Phase. The data in the address is fetched and it is stored in MEM_wb_data

register. In the WB_Phase, the data in the MEM_wb_data register is written back and stored in the

register in the register bank module. If the instruction data type is byte, 8 bits are fetched from the

memory. If the instruction data type is halfword, 16 bit data is fetched from the memory. If the

instruction data type is word, 32 bit data is fetched from the memory.


Canr1g09 33

Control

Reg_bank

Alu_operands

+

Inst

[0:3

1] Disp[0:31]

Reg0_add[0:4]

Reg1_add[0:4]

Val

ue_

op0[0

:31]

Val

ue_

op1[0

:31]

Regshift_add[0:4]Wb_reg[0:4]

op0[0:31]

op1[0:31]

ID_op1[0:31]

ID_op2[0:31]

EA[0:31]


ME

M_W

b_reg

[0:4

]

ME

M_w

b_data[0

:31]Wb_data[0:31]

Wb_reg[0:4]

Opcode[0:5]



Data

Memory

Address

Data out

EX_EA[0:31]EX_EA[0:31]

EX

_E

A[0

:31]

Extendedopcode[0:9]

Upd_add[0:4]ID_update

Updat

e

ld

ID_ldld

32

32

32

3232

32

5

6

32 32

32

5

10

5

5

Figure 15: Load Instruction


Canr1g09 34

3.2.2.2 Store Instructions:

The figure 16 shows the processor design which supports PowerPC store instructions. The

memory address can be calculated by adding the base address and displacement or base address and

index register. The instruction is fetched from the instruction memory in the IF_Phase. In the

ID_Phase, the instruction is sent to the control module. The control module separates opcode field,

extendedopcode field, register fields, and displacement. The three register fields are passed to the

register bank module to get the data value_op0, value_op1, and mem_data. The mem_data represents

the 32-bit data to be written in the data memory. The value_op0 and value_op1 are 32-bit data which

are used to calculate the memory address. Based on the type of the instruction, either displacement or

value_op1 is selected. The op0 and op1 are stored in the pipeline registers at the end of the ID_Phase.

If the update signal is set to 1, the memory address is updated in the base register for the store

instruction with update. The st signal shown in the figure 16 is set to 1 if the store instruction is being

executed. The EX_EA register stores the memory address of the data memory. In the MEM_Phase,

the mem _data is written in the data memory specified by the address. If the instruction data type is

byte, 8 bits are written to the memory. If the instruction data type is halfword, 16 bit data is written to

the memory. If the instruction data type is word, 32 bit data is written to the memory.


Canr1g09 35

Control

Reg_bank

Alu_operands

+

Inst

[0:3

1]

Disp[0:31]

Reg0_add[0:4]

Reg1_add[0:4]

Val

ue_

op

0[0

:31

]

Val

ue_

op

1[0

:31

]

Regshift_add[0:4]op0[0:31]

op1[0:31]

ID_op1[0:31]

ID_op2[0:31]

EA[0:31]

Opcode[0:5]

ID/EX EX/MEMIF/ID


Data

Memory

Address

Data in

EX_EA[0:31]

EX

_E

A[0

:31

]

Extendedopcode[0:9]

Upd_add[0:4]ID_update

Up

dat

e

st

ID_stst

Regshift_add[0:4]

Mem_data[0:31] ID_Mem_data[0:31]32

32

32

5

5

5

5

32

32

6

10

32

Figure 16: Store Instruction


Canr1g09 36

3.2.3 Branch Instructions:

The figure 17 shows the processor design for the branch instructions. The instruction is

fetched from the instruction memory and stored in the pipeline registers in the IF_Phase module. In

the beginning of the ID_Phase, the control module predicts whether the instruction is branch or not.

There are two branch instructions, conditional branch (bc) and unconditional branch (b). If the

instruction is branch unconditional, the control module separates the opcode field, AA bit, LK bit, and

LI field. The LI field is extended to 32-bit by extending the sign bit. If the instruction is branch

conditional, the control module separates opcode field, BO field, BI field, AA bit, LK bit, and BD

field. The BD field is extended by the sign extending the BD to 32-bit. The LI and BD indicate the

branch target address. The 5-bit BO field indicate the condition to be tested by the branch conditional

instruction and 5-bit BI field indicate the bit in the condition register.

The branch instruction tests the condition in the beginning of the EX_Phase. The branch

target address is also calculated in this phase. The branch_addr signal in the figure 17 indicates the

branch target address. If the AA signal is 0, the branch address is formed by adding the BD or LI with

the current IP. If the condition is true, the branch_addr is moved to the IP in the register bank module.

The program counter value is changed. But the instruction in the IF_Phase and EX_Phase need to be

removed from the pipeline. Once the condition satisfied, branch_cond signal is set to 1. This signal

deletes the instruction in the IF_Phase and EX_Phase module.

If the condition is not satisfied, the instructions in the pipeline are executed in sequential

order. The program counter is not modified. If the instruction is unconditional branch, the program

counter is modified with the branch_addr. If the LK signal is enabled, the branch_addr is stored in the

link register.


Canr1g09 37

NIP

Instruction

Memory

+

4

Data

control

IP

+

Branch_

check

CR

0

Link

register

Inst

[0:3

1]

IF/ID ID/EXEX/MEM

AA LI[0:31]

BD[0:31]

LK

MB[0:5]ME[0:5]

PC_addr ID_PC_addr

brtgt_addr

Bra

nch

_ad

dr[

0:3

1]

Bra

nch

_co

nd32

32 3232

32

1

5

5

1

32

32

1

0

Figure 17: Branch Instructions


Canr1g09 38

3.2.4 Data forwarding and Load-Use:

The data forwarding and load-use module are used to avoid data hazards. The figure 18 shows

the processor design to avoid data hazards. The data forwarding and load-use module is implemented

in ID_Phase module. The output from the alu module is out signal and it is fed back to as the input to

the data forwarding module. The output of the EX_Phase module and MEM_Phase module is fed

back as the input to the data forwarding module. The data forwarding modules compares the whether

the register field is same as the write back register field. If the register fields are same, the data from

that phase is forwarded as the operand values. The instruction should be fixed point integer

instruction. The data forward module forwards only when the register field is same as the write back

register field in all the phases.

If the instruction following the load instruction may be fixed point integer instructions where

the register field is same as the fixed point integer instructions, the data is not available for that

instruction. The data is available at the end of the MEM_Phase. Because the write back register for

the load instruction fetch the data from the data memory in the MEM_Phase. At this point the next

instruction reaches the EX_Phase, fetches the wrong register value instead of the correct value. To get

the correct value, the instruction has to wait till it gets correct value. The signals in the EX_Phase

have to wait till the data is available for the execution. The IF_Phase and ID_Phase in the pipeline

have to stall for 1 clock cycle. So the data will be available at the end of 1 clock cycle.

The stall_PC in the figure 18 is used to stall the pipeline. If stall_PC is enabled, the pipeline is

stalled. The op0_sel, op0_sel, and rs_sel are used as the control signal to select the data from the

MEM_Phase. The load-use module stalls the pipeline by enabling the stall_PC signal. The instruction

pointer is also stalled for 1 clock cycle. When the pipeline is stalled, the current instruction pointer

value is stored till the stall_PC clear to 0. For example, if the data for the op1 is not available, op1

have to wait till the data is available. The other signals in the EX_Phase also have to wait till the data

is available. Once the data is available, the op1_sel signal selects the data from the memory. The

EX_Phase will execute the instruction.


Canr1g09

39

Ins_mem

control

Register

bank

Load-Use

Data Forward

Alu_operands

Data

Memory

IP

NIP

Regshift_add

{R

eg0_ad

d, re

g1_ad

d}

{file_

op0, file_

op1}

{valu

e_op0, v

alu_op1}

Valu

e_rs

op0

op1

Wb_reg

Mem

_dat

a

Op0_sel

Op1_sel

rs_sel

Stall_PC

out

EX

_out

IP

NIP

inst

A

L

U

EX

_out

out ME

M_w

b_data

ME

M_w

b_dat

a

ID_Wb_reg EX_Wb_reg

MEM_Wb_reg

IF/ID ID/EX EX/MEM MEM/WB

ID_Phase

IF_Phase

EX_Phase MEM_Phase

32

32

32

32

32

32

32

32

32

32

32

32

32 5

5

5

5

32

32

32

1

1

1

1

32

Alu_op1

Alu_op032

32

4+

0

1

0

1

0

1

Data in

Address

Data out

Wb_regWb_data

ME

M_W

b_re

g

Figure 18: Data Forwarding and Load Use


Canr1g09 40

3.2.5 System Call Instruction (sc):

Due to the time limit, the System call (sc) is not implemented in the design. The special

purpose registers such as MSR, EVPR, SRR0, and SRR1 is not implemented in the design. This

instruction is used for the system call exception. When the system call exception is occurred, the data

in the machine state register (MSR) is moved to the SRR1 (store/restore register1). The SRR0

(store/restore register0) is used to store the next instruction which follows the system call instruction.

The system call instruction modifies the bit in the MSR.

The exception vector address (EVA) is moved to the next instruction pointer (NIP). The

program flow sequence is changed. The EVA is formed by the concatenating the highword in the

Exception vector prefix register (EVPR) to the left. The MSR contents are modified when the

instructions are fetched from the NIP.


Canr1g09 41

This chapter covers how to create the PowerPC processor instructions, simulating the

instructions and final result of the each design. The NC Verilog simulator is used to simulate the

instruction and the output signal waveform is verified and the design is not synthesized.

4.1 Creating Instructions:

The instructions are written in the hex code format and stored in the instruction memory

module of the IF_Phase module. The hex code instruction are fetched from the instruction memory

and passed to the control module. For example, the hex code for the Instruction andis. r9, r12,FOFO

is 7589FOFOh. Similarly, all the instructions are converted to hex code based on the format of the

instruction shown in the Appendix [5]. Due to the time limit, floating point instructions, exceptions,

interrupt, management instructions, and control instructions are not implemented. The instructions

which are implemented are shown in the Appendix [3]. The instructions which are not implemented in

the design are shown in the Appendix [4].

4.2 Fixed Point Integer Instructions:

In this section, the fixed point instructions are tested and the final waveforms are discussed.

The integer instruction covers arithmetic, logical, compare, rotate, and shift instructions. All the

instructions are run in the NC Verilog simulator. The command used for the simulation is,

ncv_gui PPC_proceesor_stim.v PPC_processor.v

The Design browser window will open. Select the signals that affects by the instruction and view in

the waveform window.

The PPC_processor is the top level module. It connects the 5 modules of the pipeline such as

IF_Phase, ID_Phase, EX_Phase, and MEM_Phase. The write back phase is implemented in the

ID_Phase. Each module takes 1 clock cycle for their execution because it is a sequential block.

Therefore, each instruction takes totally five clock cycles for their complete execution. The following

figure 19 shows the design browser window. The PPC_processor_stim is the test bench module. The

PowerPC is the instance name of the top level module and the IF, ID, EX, MEM are the sub modules

of the top level module.


Canr1g09 42

Figure 19: Design Browser Window


Canr1g09 43

4.2.1 Fixed Point Arithmetic Instructions:

The design of the fixed point instruction is simulated in the NC Verilog simulator. Arithmetic,

logical, shift, rotate, and compare instructions are simulated. The following figure 20 shows the

simulated waveform of the addi instruction. The inst in the figure 20 shows the 32 bit instruction

fetched from the instruction memory. The ID_op0 and ID_op1 are source operands which stores the

value of the two operands. The out register is used to store the result of the arithmetic instruction. This

output is stored in the EX_out register at the end of the EX_Phase module. Since the arithmetic

instructions do not access the memory, the data is stored in the MEM_Phase for 1 clock cycle. The

MEM_wb_data register stores the data coming from the EX_out register in the MEM_Phase module.

Then the data is moved back to the register bank module to store the result in the destination register.

For example, consider addition instruction addi r8,r2,0080h. The hex code of this instruction

is 39020080h. In the figure 20, the TimeA shows the instruction being fetched. In the next clock

cycle, the ID_op0 and ID_op1 are the input operand of the instruction. The immediate value in the

instruction 0080h is separated by the control module. The value in the register r2 is 0 and moved to

the ID_op0. The immediate value is moved to ID_op1. This is shown in the figure 20. In the

EX_Phase module i.e 3rd

clock cycle, the ID_op0 and ID_op1 are added and the result is stored in the

EX_out register. In the 4th clock cycle, the result of the addition is moved to the MEM_wb_data

register. The write back register is moved to MEM_wb_reg register. The data in the MEM_wb_data is

stored in the register, r8. This instruction takes 5 clock cycles for its complete execution. The CR and

XER are the register which updates the carry and overflow bit.


Canr1g09 44

Figure 20: addi Instruction


Canr1g09 45

4.2.2 Fixed Point Logical Instructions:

The figure 21 shows the final result of the simulated logical ori instruction. The inst is the 32

bit instruction which is the pipeline registers of the IF_Phase module. The ID_op0 and ID_op1 are the

used to store the two operands for the alu module. The output of the logical instruction is stored in the

EX_out register at the end of the EX_Phase module. The data is moved to next stage, MEM_Phase

module and stored in the MEM_wb_data register. Finally, the data is moved to the write back register

in the register bank module in the ID_Phase module.

The figure 21 shows the signal waveform of the logical ori instruction. The hex coded value

of the ori r11,r19,00FFh instruction is 626B00FFh. The timeA in the figure 21 shows the instruction

being fetched from the memory. The unsigned immediate data is sent as the second operand and

stored in the ID_op1. The data in the register, r11 is sent to the ID_op0 register in the ID_Phase

module. In the EX_Phase module, the two operands in the ID_op0 and ID_op1 are sent to the alu

module. The alu module perform the logical or and stores the result in the EX_out register. This is

shown in figure 21. The result of the logical ori instruction is sent to the write back registers through

the MEM_Phase module. It takes 5 clock cycles for its complete execution of the instruction.


Canr1g09 46

Figure 21: ORI Instruction


Canr1g09 47

4.2.3 Fixed Point Shift Instructions:

The fixed point shift instructions are tested by simulating the shift instructiong in the

NC verilog simulator. The fixed point shift instructions are fetched from the instruction memory in

the IF_Phase module. The ID_op0 and ID_op1 are two pipeline registers used to store the two

operand value. These data is passed to the alu module. The ID_op1 is shift right by the number bits

specified by ID_sh. The shifted data is stored in the EX_out register at the end of the EX_Phase

module. This shifted result is moved to next Phase, MEM_Phase and stored in the MEM_wb_data

register. The MEM_wb_reg is the write back register where the result of the shift instruction is stored.

The sraw r17,r8,sh[00111] instruction is simulated in the NC verilog simulator and the necessary

signals are shown in the waveform in the figure 22. The hex code for this instruction is 7D113E70h.

The timeA in the figure 22 shows the instruction being fetched from the memory.

The data, 00000080h in the resister, r8 are stored in the ID_op0 register. The value, 07h of sh

is stored in the ID_sh register. The ID_op0 is shifted by the number of bits specified by the ID_sh.

The shifted data, 00000001h is stored in the EX_out register at the end of the EX_Phase module. This

data is moved to MEM_Phase and stored in MEM_wb_data register and written back in the register,

r17.


Canr1g09 48

Figure 22: Shift Instruction (sraw)


Canr1g09 49

4.2.4 Fixed Point Rotate Instructions:

The operation of the rotate instruction is described in the section 2.6.4. For the rotate

instructions, the mask has to be generated. The mask is generated at the beginning of the EX_Phase

module. The rotate module in the EX_phase module is used for executing the rotate instructions. The

inputs to this module are two operands from the ID_op0 and ID_op1. The number bits to be rotated is

specified either in ID_op1 or ID_sh. The figure 23 shows the result of rlwimi instruction.

The instruction is converted to hex code. The hex code of the rlwini r3,r10,sh,mb, me

instruction is 51434195. The timeA in the figure 23 shows the instruction being fetched from the

memory. The value of the r10 is 868EFF7Fh and the value of the ID_ME, ID_ME, and ID_sh are 06h,

0Ah, and 08h. The value of the destination register, r3 is 00000080h and moved to the ID_op1. The

data in the ID_op1 is rotated by the number of bits specified by the ID_sh (07h). The generation of the

mask is explained in the section 2.6.4.1. The generated value of the mask is 03C00000h. The rotate

module in the EX_Phase rotates the ID_op1. The ID_op0 is moved to the out register. The rotated

data is inserted to the out register where the corresponding bits in the mask register should be 1. If the

bits in the mask are 0, then the corresponding bit in the rotated data is not inserted. In the figure 23, it

is shown that wherever the bits in the mask are 1, the roted_out data is inserted in the out register. The

data in the EX_out is moved to the next phase, MEM_Phase. Finally, the data in the MEM_wb_data

register is written back to the destination register, r3.


Canr1g09 50

Figure 23: rlwimi Instruction


Canr1g09 51

4.2.5 Fixed Point

report

Documents

powerpc instructions

designed processor

pipelined powerpc processor

designed pipelined processor

powerpc registers

computer science canr1g09

computer science design

branch instructions