design of digital circuits lecture 13: multi-cycle microarch

96
Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring 2017 6 April 2017

Upload: others

Post on 22-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Design of Digital Circuits

Lecture 13: Multi-Cycle Microarch.

Prof. Onur Mutlu ETH Zurich Spring 2017 6 April 2017

Page 2: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Agenda for Today & Next Few Lectures !  Single-cycle Microarchitectures

!  Multi-cycle and Microprogrammed Microarchitectures !  Pipelining

!  Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, …

!  Out-of-Order Execution

!  Issues in OoO Execution: Load-Store Handling, …

2

Page 3: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Readings for Today !  P&P, Chapter 4, 5

"  Microarchitecture

!  P&P, Revised Appendix C "  Microarchitecture of the LC-3b "  Appendix A (LC-3b ISA) will be useful in following this

!  H&H, Chapter 7.4 (keep reading)

!  Optional "  Maurice Wilkes, “The Best Way to Design an Automatic

Calculating Machine,” Manchester Univ. Computer Inaugural Conf., 1951.

3

Page 4: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Implementing the ISA: Microarchitecture Basics (Review)

4

Page 5: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

How Does a Machine Process Instructions? !  What does processing an instruction mean? !  We will assume the von Neumann model (for now)

AS = Architectural (programmer visible) state before an instruction is processed

Process instruction

AS’ = Architectural (programmer visible) state after an

instruction is processed

!  Processing an instruction: Transforming AS to AS’ according to the ISA specification of the instruction

5

Page 6: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

The Von Neumann Model/Architecture !  Also called stored program computer (instructions in

memory). Two key properties:

!  Stored program "  Instructions stored in a linear memory array "  Memory is unified between instructions and data

!  The interpretation of a stored value depends on the control signals

!  Sequential instruction processing "  One instruction processed (fetched, executed, and completed) at a

time "  Program counter (instruction pointer) identifies the current instr. "  Program counter is advanced sequentially except for control transfer

instructions

6

When is a value interpreted as an instruction?

Page 7: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

The Von Neumann Model/Architecture !  Recommended reading

"  Burks, Goldstein, von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument,” 1946.

!  Required reading "  Patt and Patel book, Chapter 4, “The von Neumann Model”

!  Stored program

!  Sequential instruction processing

7

Page 8: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

The Von Neumann Model (of a Computer)

8

CONTROL UNIT

IP Inst Register

PROCESSING UNIT

ALU TEMP

MEMORY

Mem Addr Reg

Mem Data Reg

INPUT OUTPUT

Page 9: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

The Von Neumann Model (of a Computer) !  Q: Is this the only way that a computer can operate?

!  A: No. !  Qualified Answer: But, it has been the dominant way

"  i.e., the dominant paradigm for computing "  for N decades

9

Page 10: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

The Dataflow Model (of a Computer) !  Von Neumann model: An instruction is fetched and

executed in control flow order "  As specified by the instruction pointer "  Sequential unless explicit control flow instruction

!  Dataflow model: An instruction is fetched and executed in data flow order "  i.e., when its operands are ready "  i.e., there is no instruction pointer "  Instruction ordering specified by data flow dependence

!  Each instruction specifies “who” should receive the result !  An instruction can “fire” whenever all operands are received

"  Potentially many instructions can execute at the same time !  Inherently more parallel

10

Page 11: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Von Neumann vs Dataflow !  Consider a Von Neumann program

"  What is the significance of the program order? "  What is the significance of the storage locations?

!  Which model is more natural to you as a programmer? 11

v<=a+b;w<=b*2;x<=v-wy<=v+wz<=x*y

+ *2

- +

*

a b

z

Sequential

Dataflow

Page 12: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

More on Data Flow !  In a data flow machine, a program consists of data flow

nodes "  A data flow node fires (fetched and executed) when all it

inputs are ready !  i.e. when all inputs have tokens

!  Data flow node and its ISA representation

12

Page 13: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Data Flow Nodes

13

Page 14: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

An Example Data Flow Program

14

OUT

Page 15: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

ISA-level Tradeoff: Instruction Pointer

!  Do we need an instruction pointer in the ISA? "  Yes: Control-driven, sequential execution

!  An instruction is executed when the IP points to it !  IP automatically changes sequentially (except for control flow

instructions)

"  No: Data-driven, parallel execution !  An instruction is executed when all its operand values are

available (data flow)

!  Tradeoffs: MANY high-level ones "  Ease of programming (for average programmers)? "  Ease of compilation? "  Performance: Extraction of parallelism? "  Hardware complexity?

15

Page 16: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

ISA vs. Microarchitecture Level Tradeoff !  A similar tradeoff (control vs. data-driven execution) can be

made at the microarchitecture level

!  ISA: Specifies how the programmer sees instructions to be executed "  Programmer sees a sequential, control-flow execution order vs. "  Programmer sees a data-flow execution order

!  Microarchitecture: How the underlying implementation actually executes instructions "  Microarchitecture can execute instructions in any order as long

as it obeys the semantics specified by the ISA when making the instruction results visible to software !  Programmer should see the order specified by the ISA

16

Page 17: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

The “Process Instruction” Step !  ISA specifies abstractly what AS’ should be, given an

instruction and AS "  It defines an abstract finite state machine where

!  State = programmer-visible state !  Next-state logic = instruction execution specification

"  From ISA point of view, there are no “intermediate states” between AS and AS’ during instruction execution !  One state transition per instruction

!  Microarchitecture implements how AS is transformed to AS’ "  There are many choices in implementation "  We can have programmer-invisible state to optimize the speed of

instruction execution: multiple state transitions per instruction !  Choice 1: AS # AS’ (transform AS to AS’ in a single clock cycle) !  Choice 2: AS # AS+MS1 # AS+MS2 # AS+MS3 # AS’ (take multiple

clock cycles to transform AS to AS’) 17

Page 18: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

A Very Basic Instruction Processing Engine !  Each instruction takes a single clock cycle to execute !  Only combinational logic is used to implement instruction

execution "  No intermediate, programmer-invisible state updates

AS = Architectural (programmer visible) state at the beginning of a clock cycle

Process instruction in one clock cycle

AS’ = Architectural (programmer visible) state

at the end of a clock cycle

18

Page 19: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

A Very Basic Instruction Processing Engine !  Single-cycle machine

!  What is the clock cycle time determined by? !  What is the critical path of the combinational logic

determined by?

19

AS’ AS Sequen.alLogic(State)

Combina.onalLogic

Page 20: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Remember: Programmer Visible (Architectural) State

20

M[0]M[1]M[2]M[3]M[4]

M[N-1]Memoryarrayofstorageloca.onsindexedbyanaddress

ProgramCountermemoryaddressofthecurrentinstruc.on

Registers-givenspecialnamesintheISA(asopposedtoaddresses)-generalvs.specialpurpose

Instruc.ons(andprograms)specifyhowtotransformthevaluesofprogrammervisiblestate

Page 21: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Single-cycle vs. Multi-cycle Machines !  Single-cycle machines

"  Each instruction takes a single clock cycle "  All state updates made at the end of an instruction’s execution "  Big disadvantage: The slowest instruction determines cycle time #

long clock cycle time

!  Multi-cycle machines "  Instruction processing broken into multiple cycles/stages "  State updates can be made during an instruction’s execution "  Architectural state updates made only at the end of an instruction’s

execution "  Advantage over single-cycle: The slowest “stage” determines cycle time

!  Both single-cycle and multi-cycle machines literally follow the von Neumann model at the microarchitecture level

21

Page 22: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Instruction Processing “Cycle” !  Instructions are processed under the direction of a “control

unit” step by step. !  Instruction cycle: Sequence of steps to process an instruction !  Fundamentally, there are six steps:

!  Fetch !  Decode !  Evaluate Address !  Fetch Operands !  Execute !  Store Result

!  Not all instructions require all six steps (see P&P Ch. 4) 22

Page 23: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Instruction Processing “Cycle” vs. Machine Clock Cycle

!  Single-cycle machine: "  All six phases of the instruction processing cycle take a single

machine clock cycle to complete

!  Multi-cycle machine: "  All six phases of the instruction processing cycle can take

multiple machine clock cycles to complete "  In fact, each phase can take multiple clock cycles to complete

23

Page 24: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Instruction Processing Viewed Another Way !  Instructions transform Data (AS) to Data’ (AS’) !  This transformation is done by functional units

"  Units that “operate” on data

!  These units need to be told what to do to the data

!  An instruction processing engine consists of two components "  Datapath: Consists of hardware elements that deal with and

transform data signals !  functional units that operate on data !  hardware structures (e.g. wires and muxes) that enable the flow of

data into the functional units and registers !  storage units that store data (e.g., registers)

"  Control logic: Consists of hardware elements that determine control signals, i.e., signals that specify what the datapath elements should do to the data

24

Page 25: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Single-cycle vs. Multi-cycle: Control & Data !  Single-cycle machine:

"  Control signals are generated in the same clock cycle as the one during which data signals are operated on

"  Everything related to an instruction happens in one clock cycle (serialized processing)

!  Multi-cycle machine: "  Control signals needed in the next cycle can be generated in

the current cycle "  Latency of control processing can be overlapped with latency

of datapath operation (more parallelism)

!  We will see the difference clearly in microprogrammed multi-cycle microarchitectures

25

Page 26: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Many Ways of Datapath and Control Design

!  There are many ways of designing the data path and control logic

!  Single-cycle, multi-cycle, pipelined datapath and control !  Single-bus vs. multi-bus datapaths !  Hardwired/combinational vs. microcoded/microprogrammed

control "  Control signals generated by combinational logic versus "  Control signals stored in a memory structure

!  Control signals and structure depend on the datapath design

26

Page 27: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Performance Analysis !  Execution time of an instruction

"  {CPI} x {clock cycle time}

!  Execution time of a program "  Sum over all instructions [{CPI} x {clock cycle time}] "  {# of instructions} x {Average CPI} x {clock cycle time}

!  Single cycle microarchitecture performance "  CPI = 1 "  Clock cycle time = long

!  Multi-cycle microarchitecture performance "  CPI = different for each instruction

!  Average CPI # hopefully small

"  Clock cycle time = short 27

Now, we have two degrees of freedom to optimize independently

Page 28: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Review: Complete Single-Cycle Processor

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

Page 29: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Another Example Single-Cycle Processor

29

Shift left 2

PC

Instruction memory

Read address

Instruction [31– 0]

Data memory

Read data

Write data

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction [15– 11]

Instruction [20– 16]

Instruction [25– 21]

Add

ALU result

Zero

Instruction [5– 0]

MemtoRegALUOpMemWrite

RegWrite

MemReadBranchJumpRegDst

ALUSrc

Instruction [31– 26]

4

M u x

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Sign extend

16 32Instruction [15– 0]

1

M u x

1

0

M u x

0

1

M u x

0

1

ALU control

Control

Add ALU result

M u x

0

1 0

ALU

Shift left 226 28

Address

PCSrc2=BrTaken

PCSrc1=Jump

ALUopera.on

bcond

**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.] JAL,JR,JALRomiaed

Page 30: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Evaluating the Single-Cycle Microarchitecture

30

Page 31: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

A Single-Cycle Microarchitecture !  Is this a good idea/design?

!  When is this a good design?

!  When is this a bad design?

!  How can we design a better microarchitecture?

31

Page 32: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

A Single-Cycle Microarchitecture: Analysis !  Every instruction takes 1 cycle to execute

"  CPI (Cycles per instruction) is strictly 1

!  How long each instruction takes is determined by how long the slowest instruction takes to execute "  Even though many instructions do not need that long to

execute

!  Clock cycle time of the microarchitecture is determined by how long it takes to complete the slowest instruction "  Critical path of the design is determined by the processing

time of the slowest instruction

32

Page 33: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

What is the Slowest Instruction to Process? !  Let’s go back to the basics

!  All six phases of the instruction processing cycle take a single machine clock cycle to complete

"  Fetch "  Decode "  Evaluate Address "  Fetch Operands "  Execute "  Store Result

!  Do each of the above phases take the same time (latency) for all instructions?

33

1. Instruction fetch (IF) 2. Instruction decode and register operand fetch (ID/RF) 3. Execute/Evaluate memory address (EX/AG) 4. Memory operand fetch (MEM) 5. Store/writeback result (WB)

Page 34: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Example Single-Cycle Datapath Analysis !  Assume (for the design in Slide 29)

"  memory units (read or write): 200 ps "  ALU and adders: 100 ps "  register file (read or write): 50 ps "  other combinational logic: 0 ps

34

steps IF ID EX MEM WBDelay

resources mem RF ALU mem RF

R-type 200 50 100 50 400

I-type 200 50 100 50 400LW 200 50 100 200 50 600SW 200 50 100 200 550

Branch 200 50 100 350Jump 200 200

Page 35: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Let’s Find the Critical Path

35

Shift left 2

PC

Instruction memory

Read address

Instruction [31– 0]

Data memory

Read data

Write data

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction [15– 11]

Instruction [20– 16]

Instruction [25– 21]

Add

ALU result

Zero

Instruction [5– 0]

MemtoRegALUOpMemWrite

RegWrite

MemReadBranchJumpRegDst

ALUSrc

Instruction [31– 26]

4

M u x

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Sign extend

16 32Instruction [15– 0]

1

M u x

1

0

M u x

0

1

M u x

0

1

ALU control

Control

Add ALU result

M u x

0

1 0

ALU

Shift left 226 28

Address

PCSrc2=BrTaken

PCSrc1=Jump

ALUopera.on

bcond

[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]

Page 36: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

R-Type and I-Type ALU

36

Shift left 2

PC

Instruction memory

Read address

Instruction [31– 0]

Data memory

Read data

Write data

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction [15– 11]

Instruction [20– 16]

Instruction [25– 21]

Add

ALU result

Zero

Instruction [5– 0]

MemtoRegALUOpMemWrite

RegWrite

MemReadBranchJumpRegDst

ALUSrc

Instruction [31– 26]

4

M u x

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Sign extend

16 32Instruction [15– 0]

1

M u x

1

0

M u x

0

1

M u x

0

1

ALU control

Control

Add ALU result

M u x

0

1 0

ALU

Shift left 226 28

Address

PCSrc2=BrTaken

PCSrc1=Jump

ALUopera.on

bcond

[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]

200ps 250ps

350ps400ps

100ps

100ps

Page 37: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

LW

37

Shift left 2

PC

Instruction memory

Read address

Instruction [31– 0]

Data memory

Read data

Write data

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction [15– 11]

Instruction [20– 16]

Instruction [25– 21]

Add

ALU result

Zero

Instruction [5– 0]

MemtoRegALUOpMemWrite

RegWrite

MemReadBranchJumpRegDst

ALUSrc

Instruction [31– 26]

4

M u x

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Sign extend

16 32Instruction [15– 0]

1

M u x

1

0

M u x

0

1

M u x

0

1

ALU control

Control

Add ALU result

M u x

0

1 0

ALU

Shift left 226 28

Address

PCSrc2=BrTaken

PCSrc1=Jump

ALUopera.on

bcond

[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]

200ps 250ps

350ps600ps

100ps

100ps

550ps

Page 38: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

SW

38

Shift left 2

PC

Instruction memory

Read address

Instruction [31– 0]

Data memory

Read data

Write data

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction [15– 11]

Instruction [20– 16]

Instruction [25– 21]

Add

ALU result

Zero

Instruction [5– 0]

MemtoRegALUOpMemWrite

RegWrite

MemReadBranchJumpRegDst

ALUSrc

Instruction [31– 26]

4

M u x

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Sign extend

16 32Instruction [15– 0]

1

M u x

1

0

M u x

0

1

M u x

0

1

ALU control

Control

Add ALU result

M u x

0

1 0

ALU

Shift left 226 28

Address

PCSrc2=BrTaken

PCSrc1=Jump

ALUopera.on

bcond

[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]

200ps 250ps

350ps

100ps

100ps

550ps

Page 39: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Branch Taken

39

Shift left 2

PC

Instruction memory

Read address

Instruction [31– 0]

Data memory

Read data

Write data

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction [15– 11]

Instruction [20– 16]

Instruction [25– 21]

Add

ALU result

Zero

Instruction [5– 0]

MemtoRegALUOpMemWrite

RegWrite

MemReadBranchJumpRegDst

ALUSrc

Instruction [31– 26]

4

M u x

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Sign extend

16 32Instruction [15– 0]

1

M u x

1

0

M u x

0

1

M u x

0

1

ALU control

Control

Add ALU result

M u x

0

1 0

ALU

Shift left 226 28

Address

PCSrc2=BrTaken

PCSrc1=Jump

ALUopera.on

bcond

[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]

200ps 250ps350ps

100ps

350ps

200ps

Page 40: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Jump

40

Shift left 2

PC

Instruction memory

Read address

Instruction [31– 0]

Data memory

Read data

Write data

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction [15– 11]

Instruction [20– 16]

Instruction [25– 21]

Add

ALU result

Zero

Instruction [5– 0]

MemtoRegALUOpMemWrite

RegWrite

MemReadBranchJumpRegDst

ALUSrc

Instruction [31– 26]

4

M u x

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Sign extend

16 32Instruction [15– 0]

1

M u x

1

0

M u x

0

1

M u x

0

1

ALU control

Control

Add ALU result

M u x

0

1 0

ALU

Shift left 226 28

Address

PCSrc2=BrTaken

PCSrc1=Jump

ALUopera.on

bcond

[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]

200ps

100ps

200ps

Page 41: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

What About Control Logic? !  How does that affect the critical path?

!  Food for thought for you: "  Can control logic be on the critical path? "  A note on CDC 5600: control store access too long…

41

Page 42: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

What is the Slowest Instruction to Process? !  Memory is not magic

!  What if memory sometimes takes 100ms to access?

!  Does it make sense to have a simple register to register add or jump to take {100ms+all else to do a memory operation}?

!  And, what if you need to access memory more than once to process an instruction? "  Which instructions need this? "  Do you provide multiple ports to memory?

42

Page 43: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Single Cycle uArch: Complexity !  Contrived

"  All instructions run as slow as the slowest instruction

!  Inefficient "  All instructions run as slow as the slowest instruction "  Must provide worst-case combinational resources in parallel as required

by any instruction "  Need to replicate a resource if it is needed more than once by an

instruction during different parts of the instruction processing cycle

!  Not necessarily the simplest way to implement an ISA "  Single-cycle implementation of REP MOVS (x86) or INDEX (VAX)?

!  Not easy to optimize/improve performance "  Optimizing the common case does not work (e.g. common instructions) "  Need to optimize the worst case all the time

43

Page 44: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

(Micro)architecture Design Principles !  Critical path design

"  Find and decrease the maximum combinational logic delay "  Break a path into multiple cycles if it takes too long

!  Bread and butter (common case) design "  Spend time and resources on where it matters most

!  i.e., improve what the machine is really designed to do

"  Common case vs. uncommon case

!  Balanced design "  Balance instruction/data flow through hardware components "  Design to eliminate bottlenecks: balance the hardware for the

work

44

Page 45: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Single-Cycle Design vs. Design Principles !  Critical path design

!  Bread and butter (common case) design

!  Balanced design

How does a single-cycle microarchitecture fare in light of these principles?

45

Page 46: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Aside: System Design Principles !  When designing computer systems/architectures, it is

important to follow good principles

!  Remember: “principled design” from our first lecture "  Frank Lloyd Wright: “architecture […] based upon principle,

and not upon precedent”

46

Page 47: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Aside: From Lecture 1 !  “architecture […] based upon principle, and not upon

precedent”

47

Page 48: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Aside: System Design Principles !  We will continue to cover key principles in this course !  Here are some references where you can learn more

!  Yale Patt, “Requirements, Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution,” Proc. of IEEE, 2001. (Levels of transformation, design point, etc)

!  Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966. (Flynn’s Bottleneck # Balanced design)

!  Gene M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS Conference, April 1967. (Amdahl’s Law # Common-case design)

!  Butler W. Lampson, “Hints for Computer System Design,” ACM Operating Systems Review, 1983. "  http://research.microsoft.com/pubs/68221/acrobat.pdf

48

Page 49: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

A Key System Design Principle !  Keep it simple

!  “Everything should be made as simple as possible, but no simpler.” "  Albert Einstein

!  And, keep it low cost: “An engineer is a person who can do for a dime what any fool can do for a dollar.”

!  For more, see: "  Butler W. Lampson, “Hints for Computer System Design,” ACM

Operating Systems Review, 1983. "  http://research.microsoft.com/pubs/68221/acrobat.pdf

49

Page 50: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Multi-Cycle Microarchitectures

50

Page 51: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Multi-Cycle Microarchitectures !  Goal: Let each instruction take (close to) only as much time

it really needs

!  Idea "  Determine clock cycle time independently of instruction

processing time "  Each instruction takes as many clock cycles as it needs to take

!  Multiple state transitions per instruction !  The states followed by each instruction is different

51

Page 52: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Remember: The “Process instruction” Step !  ISA specifies abstractly what AS’ should be, given an

instruction and AS "  It defines an abstract finite state machine where

!  State = programmer-visible state !  Next-state logic = instruction execution specification

"  From ISA point of view, there are no “intermediate states” between AS and AS’ during instruction execution !  One state transition per instruction

!  Microarchitecture implements how AS is transformed to AS’ "  There are many choices in implementation "  We can have programmer-invisible state to optimize the speed of

instruction execution: multiple state transitions per instruction !  Choice 1: AS # AS’ (transform AS to AS’ in a single clock cycle) !  Choice 2: AS # AS+MS1 # AS+MS2 # AS+MS3 # AS’ (take multiple

clock cycles to transform AS to AS’) 52

Page 53: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Multi-Cycle Microarchitecture AS = Architectural (programmer visible) state

at the beginning of an instruction

Step 1: Process part of instruction in one clock cycle

Step 2: Process part of instruction in the next clock cycle

AS’ = Architectural (programmer visible) state at the end of a clock cycle

53

Page 54: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Benefits of Multi-Cycle Design !  Critical path design

"  Can keep reducing the critical path independently of the worst-case processing time of any instruction

!  Bread and butter (common case) design "  Can optimize the number of states it takes to execute “important”

instructions that make up much of the execution time

!  Balanced design "  No need to provide more capability or resources than really

needed !  An instruction that needs resource X multiple times does not require

multiple X’s to be implemented !  Leads to more efficient hardware: Can reuse hardware components

needed multiple times for an instruction

54

Page 55: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Downsides of Multi-Cycle Design !  Need to store the intermediate results at the end of each

clock cycle "  Hardware overhead for registers "  Register setup/hold overhead paid multiple times for an

instruction

55

Page 56: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Remember: Performance Analysis !  Execution time of an instruction

"  {CPI} x {clock cycle time}

!  Execution time of a program "  Sum over all instructions [{CPI} x {clock cycle time}] "  {# of instructions} x {Average CPI} x {clock cycle time}

!  Single cycle microarchitecture performance "  CPI = 1 "  Clock cycle time = long

!  Multi-cycle microarchitecture performance "  CPI = different for each instruction

!  Average CPI # hopefully small

"  Clock cycle time = short 56

We have two degrees of freedom to optimize independently

Not easy to optimize design

Page 57: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

A Multi-Cycle Microarchitecture A Closer Look

57

Page 58: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

How Do We Implement This? !  Maurice Wilkes, “The Best Way to Design an Automatic

Calculating Machine,” Manchester Univ. Computer Inaugural Conf., 1951.

!  An elegant implementation: "  The concept of microcoded/microprogrammed machines

58

Page 59: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Multi-Cycle uArch !  Key Idea for Realization

"  One can implement the “process instruction” step as a finite state machine that sequences between states and eventually returns back to the “fetch instruction” state

"  A state is defined by the control signals asserted in it

"  Control signals for the next state determined in current state

59

Page 60: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

The Instruction Processing Cycle

"  Fetch "  Decode "  Evaluate Address "  Fetch Operands "  Execute "  Store Result

60

Page 61: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

A Basic Multi-Cycle Microarchitecture !  Instruction processing cycle divided into “states”

!  A stage in the instruction processing cycle can take multiple states

!  A multi-cycle microarchitecture sequences from state to state to process an instruction !  The behavior of the machine in a state is completely determined by

control signals in that state

!  The behavior of the entire processor is specified fully by a finite state machine

!  In a state (clock cycle), control signals control two things: !  How the datapath should process the data !  How to generate the control signals for the (next) clock cycle

61

Page 62: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

One Example Multi-Cycle Microarchitecture

62

Page 63: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

63

Remember:Single-CycleMIPSProcessor

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01 PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

01

25:0 <<2

27:0 31:28

PCJump

Jump

Page 64: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

64

MulB-cycleMIPSProcessor!  Single-cyclemicroarchitecture:

+ simple-  cycle.melimitedbylongestinstruc.on(lw)-  threeadders/ALUsandtwomemories

!  MulB-cyclemicroarchitecture:+ higherclockspeed+ simplerinstruc.onsrunfaster+ reuseexpensivehardwareonmul.plecycles-  sequencingoverheadpaidmany.mes-  hardwareoverheadforstoringintermediateresults

!  Samedesignsteps:datapath&control

Page 65: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

65

WhatDoWeWantToOpBmize!  SingleCycleArchitectureusestwomemories

$  Onememorystoresinstruc.ons,theotherdata$  Wewanttouseasinglememory(Smallersize)

Page 66: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

66

WhatDoWeWantToOpBmize!  SingleCycleArchitectureusestwomemories

$  Onememorystoresinstruc.ons,theotherdata$  Wewanttouseasinglememory(Smallersize)

!  SingleCycleArchitectureneedsthreeadders$  ALU,PC,Branchaddresscalcula.on$  WewanttousetheALUforallopera.ons(smallersize)

Page 67: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

67

WhatDoWeWantToOpBmize!  SingleCycleArchitectureusestwomemories

$  Onememorystoresinstruc.ons,theotherdata$  Wewanttouseasinglememory(Smallersize)

!  SingleCycleArchitectureneedsthreeadders$  ALU,PC,Branchaddresscalcula.on$  WewanttousetheALUforallopera.ons(smallersize)

!  InSingleCycleArchitectureallinstrucBonstakeonecycle$  Themostcomplexopera.onslowsdowneverything!$  Divideallinstruc.onsintomul.plesteps$  Simplerinstruc.onscantakefewercycles(averagecasemaybe

faster)

Page 68: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

68

ConsiderthelwinstrucBon!  ForaninstrucBonsuchas:lw$t0,0x20($t1)

!  Weneedto:$  Readtheinstruc.onfrommemory$  Thenread$t1fromregisterarray$  Addtheimmediatevalue(0x20)tocalculatethememoryaddress$  Readthecontentofthisaddress$  Writetotheregister$t0thiscontent

Page 69: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

69

MulB-cycleDatapath:instrucBonfetch!  FirstconsiderexecuBnglw

$  STEP1:Fetchinstruc.on

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

RegisterFile

PCPC' Instr

CLK

WD

WE

CLK

EN

IRWrite

readfromthememorylocation[rs]+immtolocation[rt]

op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 70: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

70

MulB-cycleDatapath:lwregisterread

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

RegisterFile

PCPC' Instr 25:21

CLK

WD

WE

CLK CLK

A

EN

IRWrite

op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 71: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

71

MulB-cycleDatapath:lwimmediate

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr 25:21

15:0

CLK

WD

WE

CLK CLK

A

EN

IRWrite

op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 72: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

72

MulB-cycleDatapath:lwaddress

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr 25:21

15:0

SrcB

ALUResult

SrcA

ALUOut

CLK

ALUControl2:0

ALU

WD

WE

CLK CLK

A CLK

EN

IRWrite

op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 73: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

73

MulB-cycleDatapath:lwmemoryread

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr 25:21

15:0

SrcB

ALUResult

SrcA

ALUOut

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A CLK

EN

IRWriteIorD

01

op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 74: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

74

MulB-cycleDatapath:lwwriteregister

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr 25:21

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

RegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A CLK

EN

IRWriteIorD

01

op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 75: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

75

MulB-cycleDatapath:incrementPC

PCWrite

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01PCPC' Instr 25:21

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

00011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD

01

Page 76: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

76

MulB-cycleDatapath:sw!  Writedatainrttomemory

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01PC 0

1

PC' Instr 25:21

20:16

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

MemWrite ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

00011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorDPCWrite

B

Page 77: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

77

MulB-cycleDatapath:R-typeInstrucBons!  Readfromrsandrt

$  WriteALUResulttoregisterfile$  Writetord(insteadofrt)

01

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01PC 0

1

PC' Instr 25:21

20:16

15:0

SrcB20:16

15:11

ALUResult

SrcA

ALUOut

RegDstMemWrite MemtoReg ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorDPCWrite

Page 78: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

78

MulB-cycleDatapath:beq!  Determinewhethervaluesinrsandrtareequal

$  Calculatebranchtargetaddress:BTA=(sign-extendedimmediate<<2)+(PC+4)

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD PCWrite

PCEn

Page 79: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

79

CompleteMulB-cycleProcessor

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

Page 80: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

80

ControlUnit

ALUSrcA

PCSrc

Branch

ALUSrcB1:0Opcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainController(FSM)

ALUOp1:0

ALUDecoder

RegWrite

PCWrite

IorD

MemWriteIRWrite

RegDstMemtoReg

RegisterEnables

MultiplexerSelects

Page 81: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

81

MainControllerFSM:Fetch

Reset

S0: Fetch

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

0

1 1

0

X

X

00

01

0100

10

Page 82: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

82

MainControllerFSM:Fetch

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

0

1 1

0

X

X

00

01

0100

10

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

Reset

S0: Fetch

Page 83: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

83

MainControllerFSM:DecodeIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

Reset

S0: Fetch S1: Decode

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

X

0 0

0

X

X

0X

XX

XXXX

00

Page 84: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

84

MainControllerFSM:AddressCalculaBonIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

Reset

S0: Fetch

S2: MemAdr

S1: Decode

Op = LWor

Op = SW

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

X

0 0

0

X

X

01

10

010X

00

Page 85: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

85

MainControllerFSM:AddressCalculaBonIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

Reset

S0: Fetch

S2: MemAdr

S1: Decode

Op = LWor

Op = SW

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

X

0 0

0

X

X

01

10

010X

00

Page 86: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

86

MainControllerFSM:lwIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemRead

Op = LWor

Op = SW

Op = LW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 87: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

87

MainControllerFSM:swIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1 IorD = 1MemWrite

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

Op = LWor

Op = SW

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 88: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

88

MainControllerFSM:R-TypeIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

Op = LWor

Op = SWOp = R-type

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 89: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

89

MainControllerFSM:beqIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 90: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

90

CompleteMulB-cycleControllerFSMIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 91: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

91

MainControllerFSM:addiIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

Page 92: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

92

MainControllerFSM:addiIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

Page 93: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

93

ExtendedFuncBonality:j

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01PC 0

1

PC' Instr 25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc1:0

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD PCWrite

PCEn

000110

<<2

25:0 (jump)

31:28

27:0

PCJump

Page 94: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

94

ControlFSM:jIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 00

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

Op = J

S11: Jump

Page 95: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Carnegie Mellon

95

ControlFSM:jIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 00

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

PCSrc = 10PCWrite

Op = J

S11: Jump

Page 96: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch

Design of Digital Circuits

Lecture 13: Multi-Cycle Microarch.

Prof. Onur Mutlu ETH Zurich Spring 2017 6 April 2017