Download - How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining
![Page 1: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/1.jpg)
How Computers Work Lecture 12 Page 1
How Computers WorkLecture 12
Introduction to Pipelining
![Page 2: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/2.jpg)
How Computers Work Lecture 12 Page 2
A Common Choreof College Life
![Page 3: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/3.jpg)
How Computers Work Lecture 12 Page 3
Propagation Times
Tpdwash = _______ Tpddry = _______
![Page 4: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/4.jpg)
How Computers Work Lecture 12 Page 4
Doing 1 Load
Total Time = _______________
= _______________
Step 1:
Step 2:
![Page 5: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/5.jpg)
How Computers Work Lecture 12 Page 5
Doing 2 LoadsCombinational (Harvard)
MethodStep 1:
Step 2:
Step 3:
Step 4:
Total Time
= ________
= ________
![Page 6: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/6.jpg)
How Computers Work Lecture 12 Page 6
Doing 2 LoadsPipelined (MIT) Method
Step 1:
Step 2:
Step 3:
Total Time
= ________
= ________
![Page 7: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/7.jpg)
How Computers Work Lecture 12 Page 7
Doing N Loads
• Harvard Method:_________________
• MIT Method:____________________
![Page 8: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/8.jpg)
How Computers Work Lecture 12 Page 8
A Few DefinitionsLatency: Time for 1 object to pass through entire system. (= ________ for Harvard laundry) (= ________ for MIT laundry)
Throughput: Rate of objects going through. (= ________ for Harvard laundry) (= ________ for MIT laundry)
![Page 9: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/9.jpg)
How Computers Work Lecture 12 Page 9
A Computational ProblemAdd 4 Numbers:
+ +
+
A B C D
A + B + C + D
![Page 10: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/10.jpg)
How Computers Work Lecture 12 Page 10
As a Combinational Circuit
+ +
+
Tpd Tpd
Tpd
Throughput
1 / 2 Tpd
Latency
2 Tpd
![Page 11: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/11.jpg)
How Computers Work Lecture 12 Page 11
As a Pipelined Circuit
+ +
+
Tpd
Tpd
Throughput
1 / Tpd
Latency
2 Tpd
Tpd
clock
clock
![Page 12: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/12.jpg)
How Computers Work Lecture 12 Page 12
Simplifying Assumptions
+ +
+
Tpd
Tpd
Tpd
clock 1. Synchronous inputs
2. Ts = Th = 0 Tpd c-q = 0 Tcd c-q = 0
clock
![Page 13: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/13.jpg)
How Computers Work Lecture 12 Page 13
An Inhomogeneous Case(Combinational)
* *
+
Throughput
1 / 3
Latency
3
Tpd = 2
Tpd = 1
![Page 14: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/14.jpg)
How Computers Work Lecture 12 Page 14
* *
+
Throughput
1 / 2
Latency
4
Tpd = 2
Tpd = 1
An Inhomogeneous Case(Pipelined)
![Page 15: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/15.jpg)
How Computers Work Lecture 12 Page 15
How about this one?
*(1)
+(4)
+(1)
+(4)
+(1)
Comb. Latency
6
Comb. Throughput
1/6
Pipe. Latency
12
Pipe. Throughput
1/4
![Page 16: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/16.jpg)
How Computers Work Lecture 12 Page 16
How MIT StudentsREALLY do Laundry
Steady State Throughput = ____________Steady State Latency = ____________
![Page 17: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/17.jpg)
How Computers Work Lecture 12 Page 17
Interleaving(an alternative to Pipelining)
For N Unitsof delay Tpd,steady state
Throughput
N / Tpd
Latency
Tpd
![Page 18: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/18.jpg)
How Computers Work Lecture 12 Page 18
Interleaving Parallel Circuits
clk1-4
sel
x x x x
1 2 3 4
![Page 19: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/19.jpg)
How Computers Work Lecture 12 Page 19
Definition of a Well-Formed Pipeline
• Same number of registers along path from any input to every computational unit– Insures that every computational unit sees inputs IN PHASE
• Is true (non-obvious) whenever the # of registered between all inputs and all outputs is the same.
![Page 20: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/20.jpg)
How Computers Work Lecture 12 Page 20
Method for FormingWell-Formed Pipelines
• Add registers to system output at will• Propagate registers from intermediate outputs to
intermediate inputs, cloning registers as necessary.
*(2)
+(1)
+(1)
+(1)
+(1)
![Page 21: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/21.jpg)
How Computers Work Lecture 12 Page 21
Method forMaximizing Throughput
• Pipeline around longest latency element
• Pipeline around other sections with latency as large as possible, but <= longest latency element.
*(2)
+(1)
+(1)
+(1)
+(1)
+(1)
+(1)
Comb. Latency
5Comb. Throughput
1/5Pipe. Latency
6Pipe. Throughput
1/2
![Page 22: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/22.jpg)
How Computers Work Lecture 12 Page 22
A Few Questions
• Assuming a circuit is pipelined for optimum throughput with 0 delay registers, is the pipelined throughput always greater than or equal to the combinational throughput?– A: Yes
• Is the pipelined latency ever less than combinational latency?– A: No
• When is the pipelined latency equal to combinational latency?– A: If contents of all pipeline stages have equal combinational
latency
![Page 23: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/23.jpg)
How Computers Work Lecture 12 Page 23
CPU PerformanceMIPS = Millions of Instructions Per Second
Freq = Clock Frequency, MHz
CPI = Clocks per Instruction
MIPS =Freq
CPI
To Increase MIPS:
1. DECREASE CPI.
- RISC reduces CPI to 1.0.
- CPI < 0? Tough... we’ll see multiple instruction issue machines at end of term.
2. INCREASE Freq.
- Freq limited by delay along longest combinational path; hence
- PIPELINING is the key to improved performance through fast clocks.
![Page 24: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/24.jpg)
How Computers Work Lecture 12 Page 24
WD Memory
WDRegister File
RA2Memory
RD2
WA RC
WERF WEMEM
WA
WEWE
A B
A op B
Register FileRA1
RD1
RA2
RD2
RA RB RC
BSELASEL
ALUFN
WDSEL0
0 1
010 1 2
1
ALU
Register FileSEXT
C
4:0 9:5 20:5 25:2131:26
OPCODE
RA1Memory
RD1
PCQ
+1
DPC
Z
0 1
JMP(R31,XADDR,XP)
XADDR
0 1
2
ISEL
PCSEL
OPCODE
Review: A Top-Down View of the Beta ArchitectureWith st(ra,C,rc) : Mem[C+<rc>] <- <ra>
![Page 25: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/25.jpg)
How Computers Work Lecture 12 Page 25
Pipeline Stages
GOAL: Maintain (nearly) 1.0 CPI, but increase clock speed.
APPROACH: structure processor as 4-stage pipeline:
Instruction Fetch stage: Maintains PC, fetches one instruction per cycle and passes it to
Register File stage: Reads source operands from register file, passes them to
ALU stage: Performs indicated operation, passes result to
Write-Back stage: writes result back into register file.
IF
RF
ALU
WB
WHAT OTHER information do we have to pass down the pipeline?
![Page 26: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/26.jpg)
How Computers Work Lecture 12 Page 26
Sketch of 4-Stage PipelineIF
instruction
InstructionFetch
ALU
instruction
ALU
Y
CL
A Binstruction
RegisterFile CL
instruction
WriteBack
CL
RF(read)
RF(write)
![Page 27: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/27.jpg)
How Computers Work Lecture 12 Page 27
WD Memory
WDRegister File
RA2Memory
RD2
WA RC
WERF WEMEM
WA
WEWE
A B
A op B
Register FileRA1
RD1
RA2
RD2
RA RB RC
BSEL
ASEL
ALUFN
WDSEL0
0 1
010 1 2
1
ALU
Register File
SEXT
C
4:0 9:5 20:5 25:2131:26
OPCODE
RA1Memory
RD1
PCQ
+1
DPC
Z
0 1
JMP(R31,XADDR,XP)
XADDR
0 1
2
ISEL
PCSEL
OPCODE
IF
RF
ALU
WB
![Page 28: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/28.jpg)
How Computers Work Lecture 12 Page 28
4-Pipeline Parallelism...
ADDC(r1, 1, r2)
SUBC(r1, 1, r3)
XOR(r1, r5, r1)
MUL(r1, r2, r0)
...
Consider a sequence of instructions:
Executed on our 4-stage pipeline:
ADDC(r1,1,r2) IF RF ALU WB
SUBC(r1,1,r3) IF RF ALU WB
XOR(r1,r5,r1) IF RF ALU WB
MUL(r1,r2,r0) IF RF ALU WB
Time
R2 Written
R3 Written
R1 Written
R0 WrittenR1 Read
R1 Read
R1,R5 Read
R1,R2 Read
![Page 29: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/29.jpg)
How Computers Work Lecture 12 Page 29
Pipeline Problems
LOOP: ADD(r1, r2, r3)
CMPLEC(r3, 100, r0)
BT(r0, LOOP)
XOR(r31, r31, r3)
MUL(r1, r2, r2)
...
BUT, consider instead:
ADD(r1,r2,r3) IF RF ALU WB
CMPLEC(r3,100,r0) IF RF ALU WB
BT(r0.LOOP) IF RF ALU WB
XOR(r31,r31,r3) IF RF ALU WB
MUL(r1,r2,r2) IF RF ALU WB
Time
![Page 30: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/30.jpg)
How Computers Work Lecture 12 Page 30
Pipeline HazardsPROBLEM:
Contents of a register WRITTEN by instruction k is READ by instruction k+1... before its stored in RF! EG:
ADD(r1, r2, r3)
CMPLEC(r3, 100, r0)
MULC(r1, 100, r4)
SUB(r1, r2, r5)
fails since CMPLEC sees “stale” <r3>.
ADD(r1,r2,r3) IF RF ALU WB
CMPLEC(r3,100,r0) IF RF ALU WB
BT(r0.LOOP) IF RF ALU WB
XOR(r31,r31,r3) IF RF ALU WB
MUL(r1,r2,r2) IF RF ALU WB
Time
R3 Written
R3 Read
![Page 31: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/31.jpg)
How Computers Work Lecture 12 Page 31
SOLUTIONS: 1. “Program around it”.
... document weirdo semantics, declare it a software problem.- Breaks sequential semantics!- Costs code efficiency.
ADD(r1, r2, r3)
CMPLEC(r3, 100, r0)
MULC(r1, 100, r4)
SUB(r1, r2, r5)
ADD(r1, r2, r3)
MULC(r1, 100, r4)
SUB(r1, r2, r5)
CMPLEC(r3, 100, r0)
EXAMPLE: Rewrite
as
HOW OFTEN can we do this?
ADD(r1,r2,r3) IF RF ALU WB
CMPLEC(r3,100,r0) IF RF ALU WB
IF RF ALU WB
CMPLEC(r3,100,r0) IF RF ALU WB
IF RF ALU WB
R3 Written
R3 Read
![Page 32: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/32.jpg)
How Computers Work Lecture 12 Page 32
SOLUTIONS: 2. Stall the pipeline.
Freeze IF, RF stages for 2 cycles,inserting NOPs into ALU IR...
DRAWBACK: SLOW
ADD(r1,r2,r3) IF RF ALU WB
NOP IF RF ALU WB
NOP IF RF ALU WB
CMPLEC(r3,100,r0) IF RF ALU WB
BT(r0.LOOP) IF RF ALU WB
XOR(r31,r31,r3) IF RF ALU WB
MUL(r1,r2,r2) IF RF ALU WB
R3 Written
R3 Read
![Page 33: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/33.jpg)
How Computers Work Lecture 12 Page 33
SOLUTIONS: 3. Bypass Paths.
Add extra data paths & control logic to re-route data in problem cases.
ADD(r1,r2,r3) IF RF ALU WB
CMPLEC(r3,100,r0) IF RF ALU WB
BT(r0.LOOP) IF RF ALU WB
XOR(r31,r31,r3) IF RF ALU WB
MUL(r1,r2,r2) IF RF ALU WB
<R1>+<R2> Produced
<R1>+<R2> Used
![Page 34: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/34.jpg)
How Computers Work Lecture 12 Page 34
WD Memory
WDRegister File
RA2Memory
RD2
WA RC
WERF WEMEM
WA
WEWE
A B
A op B
Register FileRA1
RD1
RA2
RD2
RA RB RC
BSEL
ASEL
ALUFN
WDSEL0
0 1
010 1 2
1
ALU
Register File
SEXT
C
4:0 9:5 20:5 25:2131:26
OPCODE
RA1Memory
RD1
PCQ
+1
DPC
Z
0 1
JMP(R31,XADDR,XP)
XADDR
0 1
2
ISEL
PCSEL
OPCODE
IF
RF
ALU
WB
Hardware Implementation of Bypass Paths
![Page 35: How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdc1a28abf838cb0fcb/html5/thumbnails/35.jpg)
How Computers Work Lecture 12 Page 35
Next Time:
• Detailed Design of– Bypass Paths + Control Logic
• What to do when Bypass Paths Don’t Work– Branch Delays / Tradeoffs– Load/Store Delays / Tradeoffs– Multi-Stage Memory Pipeline