![Page 1: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/1.jpg)
Design of Digital Circuits
Lecture 13: Multi-Cycle Microarch.
Prof. Onur Mutlu ETH Zurich Spring 2017 6 April 2017
![Page 2: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/2.jpg)
Agenda for Today & Next Few Lectures ! Single-cycle Microarchitectures
! Multi-cycle and Microprogrammed Microarchitectures ! Pipelining
! Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, …
! Out-of-Order Execution
! Issues in OoO Execution: Load-Store Handling, …
2
![Page 3: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/3.jpg)
Readings for Today ! P&P, Chapter 4, 5
" Microarchitecture
! P&P, Revised Appendix C " Microarchitecture of the LC-3b " Appendix A (LC-3b ISA) will be useful in following this
! H&H, Chapter 7.4 (keep reading)
! Optional " Maurice Wilkes, “The Best Way to Design an Automatic
Calculating Machine,” Manchester Univ. Computer Inaugural Conf., 1951.
3
![Page 4: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/4.jpg)
Implementing the ISA: Microarchitecture Basics (Review)
4
![Page 5: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/5.jpg)
How Does a Machine Process Instructions? ! What does processing an instruction mean? ! We will assume the von Neumann model (for now)
AS = Architectural (programmer visible) state before an instruction is processed
Process instruction
AS’ = Architectural (programmer visible) state after an
instruction is processed
! Processing an instruction: Transforming AS to AS’ according to the ISA specification of the instruction
5
![Page 6: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/6.jpg)
The Von Neumann Model/Architecture ! Also called stored program computer (instructions in
memory). Two key properties:
! Stored program " Instructions stored in a linear memory array " Memory is unified between instructions and data
! The interpretation of a stored value depends on the control signals
! Sequential instruction processing " One instruction processed (fetched, executed, and completed) at a
time " Program counter (instruction pointer) identifies the current instr. " Program counter is advanced sequentially except for control transfer
instructions
6
When is a value interpreted as an instruction?
![Page 7: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/7.jpg)
The Von Neumann Model/Architecture ! Recommended reading
" Burks, Goldstein, von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument,” 1946.
! Required reading " Patt and Patel book, Chapter 4, “The von Neumann Model”
! Stored program
! Sequential instruction processing
7
![Page 8: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/8.jpg)
The Von Neumann Model (of a Computer)
8
CONTROL UNIT
IP Inst Register
PROCESSING UNIT
ALU TEMP
MEMORY
Mem Addr Reg
Mem Data Reg
INPUT OUTPUT
![Page 9: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/9.jpg)
The Von Neumann Model (of a Computer) ! Q: Is this the only way that a computer can operate?
! A: No. ! Qualified Answer: But, it has been the dominant way
" i.e., the dominant paradigm for computing " for N decades
9
![Page 10: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/10.jpg)
The Dataflow Model (of a Computer) ! Von Neumann model: An instruction is fetched and
executed in control flow order " As specified by the instruction pointer " Sequential unless explicit control flow instruction
! Dataflow model: An instruction is fetched and executed in data flow order " i.e., when its operands are ready " i.e., there is no instruction pointer " Instruction ordering specified by data flow dependence
! Each instruction specifies “who” should receive the result ! An instruction can “fire” whenever all operands are received
" Potentially many instructions can execute at the same time ! Inherently more parallel
10
![Page 11: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/11.jpg)
Von Neumann vs Dataflow ! Consider a Von Neumann program
" What is the significance of the program order? " What is the significance of the storage locations?
! Which model is more natural to you as a programmer? 11
v<=a+b;w<=b*2;x<=v-wy<=v+wz<=x*y
+ *2
- +
*
a b
z
Sequential
Dataflow
![Page 12: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/12.jpg)
More on Data Flow ! In a data flow machine, a program consists of data flow
nodes " A data flow node fires (fetched and executed) when all it
inputs are ready ! i.e. when all inputs have tokens
! Data flow node and its ISA representation
12
![Page 13: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/13.jpg)
Data Flow Nodes
13
![Page 14: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/14.jpg)
An Example Data Flow Program
14
OUT
![Page 15: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/15.jpg)
ISA-level Tradeoff: Instruction Pointer
! Do we need an instruction pointer in the ISA? " Yes: Control-driven, sequential execution
! An instruction is executed when the IP points to it ! IP automatically changes sequentially (except for control flow
instructions)
" No: Data-driven, parallel execution ! An instruction is executed when all its operand values are
available (data flow)
! Tradeoffs: MANY high-level ones " Ease of programming (for average programmers)? " Ease of compilation? " Performance: Extraction of parallelism? " Hardware complexity?
15
![Page 16: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/16.jpg)
ISA vs. Microarchitecture Level Tradeoff ! A similar tradeoff (control vs. data-driven execution) can be
made at the microarchitecture level
! ISA: Specifies how the programmer sees instructions to be executed " Programmer sees a sequential, control-flow execution order vs. " Programmer sees a data-flow execution order
! Microarchitecture: How the underlying implementation actually executes instructions " Microarchitecture can execute instructions in any order as long
as it obeys the semantics specified by the ISA when making the instruction results visible to software ! Programmer should see the order specified by the ISA
16
![Page 17: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/17.jpg)
The “Process Instruction” Step ! ISA specifies abstractly what AS’ should be, given an
instruction and AS " It defines an abstract finite state machine where
! State = programmer-visible state ! Next-state logic = instruction execution specification
" From ISA point of view, there are no “intermediate states” between AS and AS’ during instruction execution ! One state transition per instruction
! Microarchitecture implements how AS is transformed to AS’ " There are many choices in implementation " We can have programmer-invisible state to optimize the speed of
instruction execution: multiple state transitions per instruction ! Choice 1: AS # AS’ (transform AS to AS’ in a single clock cycle) ! Choice 2: AS # AS+MS1 # AS+MS2 # AS+MS3 # AS’ (take multiple
clock cycles to transform AS to AS’) 17
![Page 18: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/18.jpg)
A Very Basic Instruction Processing Engine ! Each instruction takes a single clock cycle to execute ! Only combinational logic is used to implement instruction
execution " No intermediate, programmer-invisible state updates
AS = Architectural (programmer visible) state at the beginning of a clock cycle
Process instruction in one clock cycle
AS’ = Architectural (programmer visible) state
at the end of a clock cycle
18
![Page 19: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/19.jpg)
A Very Basic Instruction Processing Engine ! Single-cycle machine
! What is the clock cycle time determined by? ! What is the critical path of the combinational logic
determined by?
19
AS’ AS Sequen.alLogic(State)
Combina.onalLogic
![Page 20: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/20.jpg)
Remember: Programmer Visible (Architectural) State
20
M[0]M[1]M[2]M[3]M[4]
M[N-1]Memoryarrayofstorageloca.onsindexedbyanaddress
ProgramCountermemoryaddressofthecurrentinstruc.on
Registers-givenspecialnamesintheISA(asopposedtoaddresses)-generalvs.specialpurpose
Instruc.ons(andprograms)specifyhowtotransformthevaluesofprogrammervisiblestate
![Page 21: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/21.jpg)
Single-cycle vs. Multi-cycle Machines ! Single-cycle machines
" Each instruction takes a single clock cycle " All state updates made at the end of an instruction’s execution " Big disadvantage: The slowest instruction determines cycle time #
long clock cycle time
! Multi-cycle machines " Instruction processing broken into multiple cycles/stages " State updates can be made during an instruction’s execution " Architectural state updates made only at the end of an instruction’s
execution " Advantage over single-cycle: The slowest “stage” determines cycle time
! Both single-cycle and multi-cycle machines literally follow the von Neumann model at the microarchitecture level
21
![Page 22: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/22.jpg)
Instruction Processing “Cycle” ! Instructions are processed under the direction of a “control
unit” step by step. ! Instruction cycle: Sequence of steps to process an instruction ! Fundamentally, there are six steps:
! Fetch ! Decode ! Evaluate Address ! Fetch Operands ! Execute ! Store Result
! Not all instructions require all six steps (see P&P Ch. 4) 22
![Page 23: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/23.jpg)
Instruction Processing “Cycle” vs. Machine Clock Cycle
! Single-cycle machine: " All six phases of the instruction processing cycle take a single
machine clock cycle to complete
! Multi-cycle machine: " All six phases of the instruction processing cycle can take
multiple machine clock cycles to complete " In fact, each phase can take multiple clock cycles to complete
23
![Page 24: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/24.jpg)
Instruction Processing Viewed Another Way ! Instructions transform Data (AS) to Data’ (AS’) ! This transformation is done by functional units
" Units that “operate” on data
! These units need to be told what to do to the data
! An instruction processing engine consists of two components " Datapath: Consists of hardware elements that deal with and
transform data signals ! functional units that operate on data ! hardware structures (e.g. wires and muxes) that enable the flow of
data into the functional units and registers ! storage units that store data (e.g., registers)
" Control logic: Consists of hardware elements that determine control signals, i.e., signals that specify what the datapath elements should do to the data
24
![Page 25: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/25.jpg)
Single-cycle vs. Multi-cycle: Control & Data ! Single-cycle machine:
" Control signals are generated in the same clock cycle as the one during which data signals are operated on
" Everything related to an instruction happens in one clock cycle (serialized processing)
! Multi-cycle machine: " Control signals needed in the next cycle can be generated in
the current cycle " Latency of control processing can be overlapped with latency
of datapath operation (more parallelism)
! We will see the difference clearly in microprogrammed multi-cycle microarchitectures
25
![Page 26: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/26.jpg)
Many Ways of Datapath and Control Design
! There are many ways of designing the data path and control logic
! Single-cycle, multi-cycle, pipelined datapath and control ! Single-bus vs. multi-bus datapaths ! Hardwired/combinational vs. microcoded/microprogrammed
control " Control signals generated by combinational logic versus " Control signals stored in a memory structure
! Control signals and structure depend on the datapath design
26
![Page 27: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/27.jpg)
Performance Analysis ! Execution time of an instruction
" {CPI} x {clock cycle time}
! Execution time of a program " Sum over all instructions [{CPI} x {clock cycle time}] " {# of instructions} x {Average CPI} x {clock cycle time}
! Single cycle microarchitecture performance " CPI = 1 " Clock cycle time = long
! Multi-cycle microarchitecture performance " CPI = different for each instruction
! Average CPI # hopefully small
" Clock cycle time = short 27
Now, we have two degrees of freedom to optimize independently
![Page 28: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/28.jpg)
Review: Complete Single-Cycle Processor
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
![Page 29: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/29.jpg)
Another Example Single-Cycle Processor
29
Shift left 2
PC
Instruction memory
Read address
Instruction [31– 0]
Data memory
Read data
Write data
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALU result
Zero
Instruction [5– 0]
MemtoRegALUOpMemWrite
RegWrite
MemReadBranchJumpRegDst
ALUSrc
Instruction [31– 26]
4
M u x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign extend
16 32Instruction [15– 0]
1
M u x
1
0
M u x
0
1
M u x
0
1
ALU control
Control
Add ALU result
M u x
0
1 0
ALU
Shift left 226 28
Address
PCSrc2=BrTaken
PCSrc1=Jump
ALUopera.on
bcond
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.] JAL,JR,JALRomiaed
![Page 30: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/30.jpg)
Evaluating the Single-Cycle Microarchitecture
30
![Page 31: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/31.jpg)
A Single-Cycle Microarchitecture ! Is this a good idea/design?
! When is this a good design?
! When is this a bad design?
! How can we design a better microarchitecture?
31
![Page 32: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/32.jpg)
A Single-Cycle Microarchitecture: Analysis ! Every instruction takes 1 cycle to execute
" CPI (Cycles per instruction) is strictly 1
! How long each instruction takes is determined by how long the slowest instruction takes to execute " Even though many instructions do not need that long to
execute
! Clock cycle time of the microarchitecture is determined by how long it takes to complete the slowest instruction " Critical path of the design is determined by the processing
time of the slowest instruction
32
![Page 33: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/33.jpg)
What is the Slowest Instruction to Process? ! Let’s go back to the basics
! All six phases of the instruction processing cycle take a single machine clock cycle to complete
" Fetch " Decode " Evaluate Address " Fetch Operands " Execute " Store Result
! Do each of the above phases take the same time (latency) for all instructions?
33
1. Instruction fetch (IF) 2. Instruction decode and register operand fetch (ID/RF) 3. Execute/Evaluate memory address (EX/AG) 4. Memory operand fetch (MEM) 5. Store/writeback result (WB)
![Page 34: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/34.jpg)
Example Single-Cycle Datapath Analysis ! Assume (for the design in Slide 29)
" memory units (read or write): 200 ps " ALU and adders: 100 ps " register file (read or write): 50 ps " other combinational logic: 0 ps
34
steps IF ID EX MEM WBDelay
resources mem RF ALU mem RF
R-type 200 50 100 50 400
I-type 200 50 100 50 400LW 200 50 100 200 50 600SW 200 50 100 200 550
Branch 200 50 100 350Jump 200 200
![Page 35: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/35.jpg)
Let’s Find the Critical Path
35
Shift left 2
PC
Instruction memory
Read address
Instruction [31– 0]
Data memory
Read data
Write data
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALU result
Zero
Instruction [5– 0]
MemtoRegALUOpMemWrite
RegWrite
MemReadBranchJumpRegDst
ALUSrc
Instruction [31– 26]
4
M u x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign extend
16 32Instruction [15– 0]
1
M u x
1
0
M u x
0
1
M u x
0
1
ALU control
Control
Add ALU result
M u x
0
1 0
ALU
Shift left 226 28
Address
PCSrc2=BrTaken
PCSrc1=Jump
ALUopera.on
bcond
[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]
![Page 36: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/36.jpg)
R-Type and I-Type ALU
36
Shift left 2
PC
Instruction memory
Read address
Instruction [31– 0]
Data memory
Read data
Write data
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALU result
Zero
Instruction [5– 0]
MemtoRegALUOpMemWrite
RegWrite
MemReadBranchJumpRegDst
ALUSrc
Instruction [31– 26]
4
M u x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign extend
16 32Instruction [15– 0]
1
M u x
1
0
M u x
0
1
M u x
0
1
ALU control
Control
Add ALU result
M u x
0
1 0
ALU
Shift left 226 28
Address
PCSrc2=BrTaken
PCSrc1=Jump
ALUopera.on
bcond
[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]
200ps 250ps
350ps400ps
100ps
100ps
![Page 37: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/37.jpg)
LW
37
Shift left 2
PC
Instruction memory
Read address
Instruction [31– 0]
Data memory
Read data
Write data
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALU result
Zero
Instruction [5– 0]
MemtoRegALUOpMemWrite
RegWrite
MemReadBranchJumpRegDst
ALUSrc
Instruction [31– 26]
4
M u x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign extend
16 32Instruction [15– 0]
1
M u x
1
0
M u x
0
1
M u x
0
1
ALU control
Control
Add ALU result
M u x
0
1 0
ALU
Shift left 226 28
Address
PCSrc2=BrTaken
PCSrc1=Jump
ALUopera.on
bcond
[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]
200ps 250ps
350ps600ps
100ps
100ps
550ps
![Page 38: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/38.jpg)
SW
38
Shift left 2
PC
Instruction memory
Read address
Instruction [31– 0]
Data memory
Read data
Write data
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALU result
Zero
Instruction [5– 0]
MemtoRegALUOpMemWrite
RegWrite
MemReadBranchJumpRegDst
ALUSrc
Instruction [31– 26]
4
M u x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign extend
16 32Instruction [15– 0]
1
M u x
1
0
M u x
0
1
M u x
0
1
ALU control
Control
Add ALU result
M u x
0
1 0
ALU
Shift left 226 28
Address
PCSrc2=BrTaken
PCSrc1=Jump
ALUopera.on
bcond
[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]
200ps 250ps
350ps
100ps
100ps
550ps
![Page 39: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/39.jpg)
Branch Taken
39
Shift left 2
PC
Instruction memory
Read address
Instruction [31– 0]
Data memory
Read data
Write data
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALU result
Zero
Instruction [5– 0]
MemtoRegALUOpMemWrite
RegWrite
MemReadBranchJumpRegDst
ALUSrc
Instruction [31– 26]
4
M u x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign extend
16 32Instruction [15– 0]
1
M u x
1
0
M u x
0
1
M u x
0
1
ALU control
Control
Add ALU result
M u x
0
1 0
ALU
Shift left 226 28
Address
PCSrc2=BrTaken
PCSrc1=Jump
ALUopera.on
bcond
[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]
200ps 250ps350ps
100ps
350ps
200ps
![Page 40: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/40.jpg)
Jump
40
Shift left 2
PC
Instruction memory
Read address
Instruction [31– 0]
Data memory
Read data
Write data
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALU result
Zero
Instruction [5– 0]
MemtoRegALUOpMemWrite
RegWrite
MemReadBranchJumpRegDst
ALUSrc
Instruction [31– 26]
4
M u x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign extend
16 32Instruction [15– 0]
1
M u x
1
0
M u x
0
1
M u x
0
1
ALU control
Control
Add ALU result
M u x
0
1 0
ALU
Shift left 226 28
Address
PCSrc2=BrTaken
PCSrc1=Jump
ALUopera.on
bcond
[BasedonoriginalfigurefromP&HCO&D,COPYRIGHT2004Elsevier.ALLRIGHTSRESERVED.]
200ps
100ps
200ps
![Page 41: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/41.jpg)
What About Control Logic? ! How does that affect the critical path?
! Food for thought for you: " Can control logic be on the critical path? " A note on CDC 5600: control store access too long…
41
![Page 42: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/42.jpg)
What is the Slowest Instruction to Process? ! Memory is not magic
! What if memory sometimes takes 100ms to access?
! Does it make sense to have a simple register to register add or jump to take {100ms+all else to do a memory operation}?
! And, what if you need to access memory more than once to process an instruction? " Which instructions need this? " Do you provide multiple ports to memory?
42
![Page 43: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/43.jpg)
Single Cycle uArch: Complexity ! Contrived
" All instructions run as slow as the slowest instruction
! Inefficient " All instructions run as slow as the slowest instruction " Must provide worst-case combinational resources in parallel as required
by any instruction " Need to replicate a resource if it is needed more than once by an
instruction during different parts of the instruction processing cycle
! Not necessarily the simplest way to implement an ISA " Single-cycle implementation of REP MOVS (x86) or INDEX (VAX)?
! Not easy to optimize/improve performance " Optimizing the common case does not work (e.g. common instructions) " Need to optimize the worst case all the time
43
![Page 44: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/44.jpg)
(Micro)architecture Design Principles ! Critical path design
" Find and decrease the maximum combinational logic delay " Break a path into multiple cycles if it takes too long
! Bread and butter (common case) design " Spend time and resources on where it matters most
! i.e., improve what the machine is really designed to do
" Common case vs. uncommon case
! Balanced design " Balance instruction/data flow through hardware components " Design to eliminate bottlenecks: balance the hardware for the
work
44
![Page 45: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/45.jpg)
Single-Cycle Design vs. Design Principles ! Critical path design
! Bread and butter (common case) design
! Balanced design
How does a single-cycle microarchitecture fare in light of these principles?
45
![Page 46: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/46.jpg)
Aside: System Design Principles ! When designing computer systems/architectures, it is
important to follow good principles
! Remember: “principled design” from our first lecture " Frank Lloyd Wright: “architecture […] based upon principle,
and not upon precedent”
46
![Page 47: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/47.jpg)
Aside: From Lecture 1 ! “architecture […] based upon principle, and not upon
precedent”
47
![Page 48: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/48.jpg)
Aside: System Design Principles ! We will continue to cover key principles in this course ! Here are some references where you can learn more
! Yale Patt, “Requirements, Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution,” Proc. of IEEE, 2001. (Levels of transformation, design point, etc)
! Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966. (Flynn’s Bottleneck # Balanced design)
! Gene M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS Conference, April 1967. (Amdahl’s Law # Common-case design)
! Butler W. Lampson, “Hints for Computer System Design,” ACM Operating Systems Review, 1983. " http://research.microsoft.com/pubs/68221/acrobat.pdf
48
![Page 49: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/49.jpg)
A Key System Design Principle ! Keep it simple
! “Everything should be made as simple as possible, but no simpler.” " Albert Einstein
! And, keep it low cost: “An engineer is a person who can do for a dime what any fool can do for a dollar.”
! For more, see: " Butler W. Lampson, “Hints for Computer System Design,” ACM
Operating Systems Review, 1983. " http://research.microsoft.com/pubs/68221/acrobat.pdf
49
![Page 50: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/50.jpg)
Multi-Cycle Microarchitectures
50
![Page 51: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/51.jpg)
Multi-Cycle Microarchitectures ! Goal: Let each instruction take (close to) only as much time
it really needs
! Idea " Determine clock cycle time independently of instruction
processing time " Each instruction takes as many clock cycles as it needs to take
! Multiple state transitions per instruction ! The states followed by each instruction is different
51
![Page 52: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/52.jpg)
Remember: The “Process instruction” Step ! ISA specifies abstractly what AS’ should be, given an
instruction and AS " It defines an abstract finite state machine where
! State = programmer-visible state ! Next-state logic = instruction execution specification
" From ISA point of view, there are no “intermediate states” between AS and AS’ during instruction execution ! One state transition per instruction
! Microarchitecture implements how AS is transformed to AS’ " There are many choices in implementation " We can have programmer-invisible state to optimize the speed of
instruction execution: multiple state transitions per instruction ! Choice 1: AS # AS’ (transform AS to AS’ in a single clock cycle) ! Choice 2: AS # AS+MS1 # AS+MS2 # AS+MS3 # AS’ (take multiple
clock cycles to transform AS to AS’) 52
![Page 53: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/53.jpg)
Multi-Cycle Microarchitecture AS = Architectural (programmer visible) state
at the beginning of an instruction
Step 1: Process part of instruction in one clock cycle
Step 2: Process part of instruction in the next clock cycle
…
AS’ = Architectural (programmer visible) state at the end of a clock cycle
53
![Page 54: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/54.jpg)
Benefits of Multi-Cycle Design ! Critical path design
" Can keep reducing the critical path independently of the worst-case processing time of any instruction
! Bread and butter (common case) design " Can optimize the number of states it takes to execute “important”
instructions that make up much of the execution time
! Balanced design " No need to provide more capability or resources than really
needed ! An instruction that needs resource X multiple times does not require
multiple X’s to be implemented ! Leads to more efficient hardware: Can reuse hardware components
needed multiple times for an instruction
54
![Page 55: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/55.jpg)
Downsides of Multi-Cycle Design ! Need to store the intermediate results at the end of each
clock cycle " Hardware overhead for registers " Register setup/hold overhead paid multiple times for an
instruction
55
![Page 56: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/56.jpg)
Remember: Performance Analysis ! Execution time of an instruction
" {CPI} x {clock cycle time}
! Execution time of a program " Sum over all instructions [{CPI} x {clock cycle time}] " {# of instructions} x {Average CPI} x {clock cycle time}
! Single cycle microarchitecture performance " CPI = 1 " Clock cycle time = long
! Multi-cycle microarchitecture performance " CPI = different for each instruction
! Average CPI # hopefully small
" Clock cycle time = short 56
We have two degrees of freedom to optimize independently
Not easy to optimize design
![Page 57: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/57.jpg)
A Multi-Cycle Microarchitecture A Closer Look
57
![Page 58: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/58.jpg)
How Do We Implement This? ! Maurice Wilkes, “The Best Way to Design an Automatic
Calculating Machine,” Manchester Univ. Computer Inaugural Conf., 1951.
! An elegant implementation: " The concept of microcoded/microprogrammed machines
58
![Page 59: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/59.jpg)
Multi-Cycle uArch ! Key Idea for Realization
" One can implement the “process instruction” step as a finite state machine that sequences between states and eventually returns back to the “fetch instruction” state
" A state is defined by the control signals asserted in it
" Control signals for the next state determined in current state
59
![Page 60: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/60.jpg)
The Instruction Processing Cycle
" Fetch " Decode " Evaluate Address " Fetch Operands " Execute " Store Result
60
![Page 61: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/61.jpg)
A Basic Multi-Cycle Microarchitecture ! Instruction processing cycle divided into “states”
! A stage in the instruction processing cycle can take multiple states
! A multi-cycle microarchitecture sequences from state to state to process an instruction ! The behavior of the machine in a state is completely determined by
control signals in that state
! The behavior of the entire processor is specified fully by a finite state machine
! In a state (clock cycle), control signals control two things: ! How the datapath should process the data ! How to generate the control signals for the (next) clock cycle
61
![Page 62: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/62.jpg)
One Example Multi-Cycle Microarchitecture
62
![Page 63: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/63.jpg)
Carnegie Mellon
63
Remember:Single-CycleMIPSProcessor
SignImm
CLK
A RDInstruction
Memory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01 PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
01
25:0 <<2
27:0 31:28
PCJump
Jump
![Page 64: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/64.jpg)
Carnegie Mellon
64
MulB-cycleMIPSProcessor! Single-cyclemicroarchitecture:
+ simple- cycle.melimitedbylongestinstruc.on(lw)- threeadders/ALUsandtwomemories
! MulB-cyclemicroarchitecture:+ higherclockspeed+ simplerinstruc.onsrunfaster+ reuseexpensivehardwareonmul.plecycles- sequencingoverheadpaidmany.mes- hardwareoverheadforstoringintermediateresults
! Samedesignsteps:datapath&control
![Page 65: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/65.jpg)
Carnegie Mellon
65
WhatDoWeWantToOpBmize! SingleCycleArchitectureusestwomemories
$ Onememorystoresinstruc.ons,theotherdata$ Wewanttouseasinglememory(Smallersize)
![Page 66: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/66.jpg)
Carnegie Mellon
66
WhatDoWeWantToOpBmize! SingleCycleArchitectureusestwomemories
$ Onememorystoresinstruc.ons,theotherdata$ Wewanttouseasinglememory(Smallersize)
! SingleCycleArchitectureneedsthreeadders$ ALU,PC,Branchaddresscalcula.on$ WewanttousetheALUforallopera.ons(smallersize)
![Page 67: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/67.jpg)
Carnegie Mellon
67
WhatDoWeWantToOpBmize! SingleCycleArchitectureusestwomemories
$ Onememorystoresinstruc.ons,theotherdata$ Wewanttouseasinglememory(Smallersize)
! SingleCycleArchitectureneedsthreeadders$ ALU,PC,Branchaddresscalcula.on$ WewanttousetheALUforallopera.ons(smallersize)
! InSingleCycleArchitectureallinstrucBonstakeonecycle$ Themostcomplexopera.onslowsdowneverything!$ Divideallinstruc.onsintomul.plesteps$ Simplerinstruc.onscantakefewercycles(averagecasemaybe
faster)
![Page 68: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/68.jpg)
Carnegie Mellon
68
ConsiderthelwinstrucBon! ForaninstrucBonsuchas:lw$t0,0x20($t1)
! Weneedto:$ Readtheinstruc.onfrommemory$ Thenread$t1fromregisterarray$ Addtheimmediatevalue(0x20)tocalculatethememoryaddress$ Readthecontentofthisaddress$ Writetotheregister$t0thiscontent
![Page 69: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/69.jpg)
Carnegie Mellon
69
MulB-cycleDatapath:instrucBonfetch! FirstconsiderexecuBnglw
$ STEP1:Fetchinstruc.on
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
RegisterFile
PCPC' Instr
CLK
WD
WE
CLK
EN
IRWrite
readfromthememorylocation[rs]+immtolocation[rt]
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
![Page 70: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/70.jpg)
Carnegie Mellon
70
MulB-cycleDatapath:lwregisterread
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
RegisterFile
PCPC' Instr 25:21
CLK
WD
WE
CLK CLK
A
EN
IRWrite
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
![Page 71: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/71.jpg)
Carnegie Mellon
71
MulB-cycleDatapath:lwimmediate
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr 25:21
15:0
CLK
WD
WE
CLK CLK
A
EN
IRWrite
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
![Page 72: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/72.jpg)
Carnegie Mellon
72
MulB-cycleDatapath:lwaddress
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr 25:21
15:0
SrcB
ALUResult
SrcA
ALUOut
CLK
ALUControl2:0
ALU
WD
WE
CLK CLK
A CLK
EN
IRWrite
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
![Page 73: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/73.jpg)
Carnegie Mellon
73
MulB-cycleDatapath:lwmemoryread
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr 25:21
15:0
SrcB
ALUResult
SrcA
ALUOut
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A CLK
EN
IRWriteIorD
01
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
![Page 74: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/74.jpg)
Carnegie Mellon
74
MulB-cycleDatapath:lwwriteregister
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr 25:21
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
RegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A CLK
EN
IRWriteIorD
01
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
![Page 75: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/75.jpg)
Carnegie Mellon
75
MulB-cycleDatapath:incrementPC
PCWrite
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01PCPC' Instr 25:21
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
00011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD
01
![Page 76: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/76.jpg)
Carnegie Mellon
76
MulB-cycleDatapath:sw! Writedatainrttomemory
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01PC 0
1
PC' Instr 25:21
20:16
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
MemWrite ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
00011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorDPCWrite
B
![Page 77: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/77.jpg)
Carnegie Mellon
77
MulB-cycleDatapath:R-typeInstrucBons! Readfromrsandrt
$ WriteALUResulttoregisterfile$ Writetord(insteadofrt)
01
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01PC 0
1
PC' Instr 25:21
20:16
15:0
SrcB20:16
15:11
ALUResult
SrcA
ALUOut
RegDstMemWrite MemtoReg ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorDPCWrite
![Page 78: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/78.jpg)
Carnegie Mellon
78
MulB-cycleDatapath:beq! Determinewhethervaluesinrsandrtareequal
$ Calculatebranchtargetaddress:BTA=(sign-extendedimmediate<<2)+(PC+4)
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD PCWrite
PCEn
![Page 79: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/79.jpg)
Carnegie Mellon
79
CompleteMulB-cycleProcessor
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
![Page 80: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/80.jpg)
Carnegie Mellon
80
ControlUnit
ALUSrcA
PCSrc
Branch
ALUSrcB1:0Opcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainController(FSM)
ALUOp1:0
ALUDecoder
RegWrite
PCWrite
IorD
MemWriteIRWrite
RegDstMemtoReg
RegisterEnables
MultiplexerSelects
![Page 81: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/81.jpg)
Carnegie Mellon
81
MainControllerFSM:Fetch
Reset
S0: Fetch
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
0
1 1
0
X
X
00
01
0100
10
![Page 82: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/82.jpg)
Carnegie Mellon
82
MainControllerFSM:Fetch
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
0
1 1
0
X
X
00
01
0100
10
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch
![Page 83: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/83.jpg)
Carnegie Mellon
83
MainControllerFSM:DecodeIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch S1: Decode
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
0X
XX
XXXX
00
![Page 84: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/84.jpg)
Carnegie Mellon
84
MainControllerFSM:AddressCalculaBonIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch
S2: MemAdr
S1: Decode
Op = LWor
Op = SW
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
01
10
010X
00
![Page 85: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/85.jpg)
Carnegie Mellon
85
MainControllerFSM:AddressCalculaBonIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
Reset
S0: Fetch
S2: MemAdr
S1: Decode
Op = LWor
Op = SW
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01 0
1
PC 01
PC' Instr 25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
RegD
st
Branch
MemWrite
MemtoR
eg
ALUSrcARegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
01
10
010X
00
![Page 86: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/86.jpg)
Carnegie Mellon
86
MainControllerFSM:lwIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemRead
Op = LWor
Op = SW
Op = LW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
![Page 87: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/87.jpg)
Carnegie Mellon
87
MainControllerFSM:swIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1 IorD = 1MemWrite
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
Op = LWor
Op = SW
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
![Page 88: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/88.jpg)
Carnegie Mellon
88
MainControllerFSM:R-TypeIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
Op = LWor
Op = SWOp = R-type
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
![Page 89: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/89.jpg)
Carnegie Mellon
89
MainControllerFSM:beqIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
![Page 90: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/90.jpg)
Carnegie Mellon
90
CompleteMulB-cycleControllerFSMIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
![Page 91: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/91.jpg)
Carnegie Mellon
91
MainControllerFSM:addiIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
![Page 92: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/92.jpg)
Carnegie Mellon
92
MainControllerFSM:addiIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
![Page 93: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/93.jpg)
Carnegie Mellon
93
ExtendedFuncBonality:j
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2RD1
WE3
A2
CLK
Sign Extend
RegisterFile
01
01PC 0
1
PC' Instr 25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc1:0
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
01
Data
CLK
CLK
AB 00
011011
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD PCWrite
PCEn
000110
<<2
25:0 (jump)
31:28
27:0
PCJump
![Page 94: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/94.jpg)
Carnegie Mellon
94
ControlFSM:jIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 00
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
Op = J
S11: Jump
![Page 95: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/95.jpg)
Carnegie Mellon
95
ControlFSM:jIorD = 0
AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 00
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SWOp = R-type
Op = BEQ
Op = LWOp = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
PCSrc = 10PCWrite
Op = J
S11: Jump
![Page 96: Design of Digital Circuits Lecture 13: Multi-Cycle Microarch](https://reader030.vdocument.in/reader030/viewer/2022012811/61c2420dd5a146407e156fca/html5/thumbnails/96.jpg)
Design of Digital Circuits
Lecture 13: Multi-Cycle Microarch.
Prof. Onur Mutlu ETH Zurich Spring 2017 6 April 2017