multi-cycle cpu breaking up is hard to do…. mc - not in the textbook – we’ll get into some...

35
Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just p. 384-386 (stop at pipelined implemen Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 U nported License .

Upload: sade-hillers

Post on 22-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Multi-cycle CPU

Breaking up is hard to do….

MC - Not in the textbook – we’ll get into somedetail in lecture and suggested hw problems

Reading 4.9 just p. 384-386 (stop at pipelined implementation)

Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Page 2: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Single-Cycle CPU Summary

• Easy, particularly the control

• Which instruction takes the longest? By how much? Why is that a problem?

• ET = IC * CPI * CT

• What else can we do?

• When does a multi-cycle implementation make sense?– e.g., 70% of instructions take 75 ns, 30% take 200 ns?

– suppose 20% overhead for extra latches

• Real machines have much more variable instruction latencies than this.

200 vs. (200*.3+75*.7)*1.2 (60+50)*1.2 ~ 135

Page 3: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

You’ve been walking through history

• Someone needed to run a program

Page 4: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

You’ve been walking through history

• Someone needed to run a program

• Simple instructions were designed for very simple hardware (limited transistors)

Page 5: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

You’ve been walking through history

• Someone needed to run a program

• Simple instructions were designed for very simple hardware (limited transistors)

• Someone wants to run a new program, but not create all new hardware

Page 6: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

You’ve been walking through history

• Someone needed to run a program

• Simple instructions were designed for very simple hardware (limited transistors)

• Someone wants to run a new program, but not create all new hardware

• More instructions added

LAB!

Page 7: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

You’ve been walking through history

• Someone needed to run a program

• Simple instructions were designed for very simple hardware (limited transistors)

• Someone wants to run a new program, but not create all new hardware

• More instructions added

• More transistors enable more complex hardware

• More complex instructions are desired as instruction memory is limited and costly

The story continues..

Page 8: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Why a Multiple Clock Cycle CPU?• the problem => single-cycle cpu has a cycle time long

enough to complete the longest instruction in the machine

• the solution => break up execution into smaller tasks, each task taking a cycle, different instructions requiring different numbers of cycles or tasks

• other advantages => reuse of functional units (e.g., alu, memory)

• ET = IC * CPI * CT

Page 9: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Breaking Execution Into Clock Cycles

• We will have five execution steps (not all instructions use all five)– fetch

– decode & register fetch

– execute

– memory access

– write-back

Page 10: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Single Cycle vs. Multi-cycle

Single Cycle

Multi-cycle

lw

sw

add

r-type

CPI CT

Draw stages and how they get cutup

Page 11: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Cutting up Single Cycle

Draw how we’d most logically cut this upThen point out wait – if I cut the cycle time, how do I keepWhat I’ve done?

Page 12: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Breaking Execution Into Clock Cycles

• Introduces extra registers when:– signal is computed in one clock cycle and used in another, AND

– the inputs to the functional block that outputs this signal can change before the signal is written into a state element.

• Significantly complicates control. Why?

• The goal is to balance the amount of work done each cycle.

Page 13: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Multicycle datapath

Intermediate latches.One ALUOne memory (give hint about self-modifying code)

Page 14: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Multicycle datapath – Load word Load word, write RTL below per cycle

Page 15: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Summary of execution steps

Step R-type Memory Branch Instruction Fetch IR = Mem[PC]

PC = PC + 4 Instruction Decode/ register fetch

A = Reg[IR[25-21]] B = Reg[IR[20-16]]

ALUout = PC + (sign-extend(IR[15-0]) << 2) Execution, address computation, branch completion

ALUout = A op B ALUout = A + sign-

extend(IR[15-0])

if (A==B) then PC=ALUout

Memory access or R-type completion

Reg[IR[15-11]] = ALUout

memory-data = Mem[ALUout]

or Mem[ALUout]=

B

Write-back Reg[IR[20-16]] = memory-data

•We can use Register-Transfer-Language (RTL) to describe these steps

Talk through each – esp. the early branch computation

Page 16: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Step R-type Memory Branch Instruction Fetch IR = Mem[PC]

PC = PC + 4 Instruction Decode/ register fetch

A = Reg[IR[25-21]] B = Reg[IR[20-16]]

ALUout = PC + (sign-extend(IR[15-0]) << 2) Execution, address computation, branch completion

ALUout = A op B ALUout = A + sign-

extend(IR[15-0])

if (A==B) then PC=ALUout

Memory access or R-type completion

Reg[IR[15-11]] = ALUout

memory-data = Mem[ALUout]

or Mem[ALUout]=

B

Write-back Reg[IR[20-16]] = memory-data

Peer instructionWhy are the firstTwo the same?

Selection Why are the first two stages always the same (best answer)?

A All instructions do the same thing at the start

B The instruction is not determined until after the 2nd cycle

C To decrease the complexity of the control logic

D Trick question – they aren’t always the same

E None of the above

Page 17: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Complete Multicycle Datapath

(don’t be intimidated – it all makes sense…)

Page 18: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Complete Multicycle Datapath

R-type – 1st cycle Draw active path

Page 19: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Complete Multicycle Datapath

R-type – 2nd cycle

Page 20: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Complete Multicycle Datapath

R-type –3rd cycle

Page 21: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Complete Multicycle Datapath

R-type – 4th cycle

Page 22: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Which inst.does PCWritestuck at 1 break? A. LwB. R-typeC. BeqD. Both A & BE. A,B,&C

Page 23: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Multicycle Control

• Single-cycle control used combinational logic

• Multi-cycle control uses a Finite State Machine.

• FSM defines a succession of states, transitions between states (based on inputs), and outputs (based on state)

• First two states same for every instruction, next state depends on opcode

Page 24: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just
Page 25: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

IF = 200psID = 50psEX = 100psM = 200psWB = 50ps

Breaking a single cycle processor into stages, hardware engineers determine these to be the execution time per stage. The code below is the most commonly executed code by the company.Loop: lw r1, 0 (r2) add r2, r3, r4 sub r5, r1, r2 beq r5, $zero

Selection Good idea? Reason

A Yes CPI stays the same. CT decreases (factor of 4)

B Yes CPI increases (factor of 4). CT decreases (factor of 5)

C No CPI increases (factor of 4). CT decreases (factor of 3)

D No CPI decreases (factor of 5 ). CT increases (factor of 5)

E No CPI stays the same. CT stays the same. Complexity increases.

Your boss is interested in changing to the MIPS multi-cycle processor. He asks you whether or not this would be a good idea. You say?

Isomorphic

Page 26: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

IF = 200psID = 200psEX = 200psM = 200psWB = 200ps

Breaking a single cycle processor into stages, hardware engineers determine these to be the execution time per stage. The code below is the most commonly executed code by the company.Loop: lw r1, 0 (r2) add r2, r3, r4 sub r5, r1, r2 beq r5, $zero

Selection Good idea? Reason

A Yes CPI stays the same. CT decreases (factor of 4)

B Yes CPI increases (factor of 4). CT decreases (factor of 5)

C No CPI increases (factor of 4). CT decreases (factor of 3)

D No CPI decreases (factor of 5 ). CT increases (factor of 5)

E No CPI stays the same. CT stays the same. Complexity increases.

Your boss is interested in changing to the MIPS multi-cycle processor. He asks you whether or not this would be a good idea. You say?

Page 27: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Balanced cycles explanationDraw single-cycle wasted timeDraw multi-cycle potential wasted time (200,50,100,200,50)

Page 28: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Multi-cycle Questions

• How many cycles will it take to execute this code?

lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume not takenadd $t5, $t2, $t3sw $t5, 8($t3)

Label: ... Selection Number of Cycles

A 5

B 21

C 22

D 25

E None of the above

Page 29: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Multi-cycle Questions

• What is going on in cycle 8?

lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume not takenadd $t5, $t2, $t3sw $t5, 8($t3)

Label: ...Selection Number of Cycles

A PC=PC+4; IR=M[pc]

B A=R[t3]; B=R[t3]

C ALUOut=R[t3]+4

D R[t3]=M[ALUOut]

E None of the above

Page 30: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Suppose you work on an embedded multi-cycle MIPS processor and your software team tells you that every program which executes has to go through memory and zero 1k bytes of data fairly often (averages 10% of ET). You realize you could just have a single instruction do this called zero1k (rs) which does:M[rs] = 0 … M[rs+1020] = 0.Your coworker thinks you are crazy. You reply?

Selection Crazy? Reason

A Yes The complexity of such an instruction combined with no performance gain is silly.

B Yes The complexity of such an instruction combined with minimal performance gain (<5%) is silly.

C No The minimal performance gains (<5%) rationalize this simple instruction.

D No The significant performance gains (>5%) rationalize this complex instruction.

E Maybe None of the above.

Remember to ask about single-cycleAnswer - D

Page 31: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Show code, then cycle analysis.

Page 32: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

• Implementation:

Finite State Machine for Control

Page 33: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

• ROM = "Read Only Memory"– values of memory locations are fixed ahead of time

• A ROM can be used to implement a truth table– if the address is m-bits, we can address 2m entries in the ROM.– our outputs are the bits of data that the address points to.

2m is the "height", and n is the "width"

ROM Implementation

m n

0 0 0 0 0 1 10 0 1 1 1 0 00 1 0 1 1 0 00 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 11 1 0 0 1 1 01 1 1 0 1 1 1

Page 34: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

• How many inputs are there?6 bits for opcode, 4 bits for state = 10 address lines(i.e., 210 = 1024 different addresses)

• How many outputs are there?16 datapath-control outputs, 4 state bits = 20

outputs

• ROM is 210 x 20 = 20K bits (and a rather unusual size)

• Rather wasteful, since for lots of the entries, the outputs are the same

— i.e., opcode is often ignored

ROM Implementation

Page 35: Multi-cycle CPU Breaking up is hard to do…. MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Reading 4.9 just

Multicycle CPU Key Points

• Performance gain achieved from variable-length instructions

• ET = IC * CPI * cycle time

• Required very few new state elements

• More, and more complex, control signals

• Control requires FSM