eeng449b/savvides lec 4.1 1/22/04 january 22, 2004 prof. andreas savvides spring 2004 eeng...
Post on 21-Dec-2015
221 views
TRANSCRIPT
![Page 1: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/1.jpg)
EENG449b/SavvidesLec 4.1
1/22/04
January 22, 2004
Prof. Andreas Savvides
Spring 2004
http://www.eng.yale.edu/courses/eeng449bG
EENG 449bG/CPSC 439bG Computer Systems
Lecture 3
Pipelining Part II
![Page 2: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/2.jpg)
EENG449b/SavvidesLec 4.2
1/22/04
Announcements
• Project groups and group meetings• Project topics
– A 1-page project proposal due next Friday, Jan 30 (email it to me)
• Project proposal should include:– 1 paragraph project overview. This describes what
your project will do.– 1 paragraph describing the specific tasks that you
need to do» E.g read papers, install tools, learn some special
programming language or hardware– 1 paragraph on what resources you need for your
project» E.g Are you using any special hardware?» Do you have access to lab/hardware/software
![Page 3: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/3.jpg)
EENG449b/SavvidesLec 4.3
1/22/04
Instruction Formats Review
![Page 4: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/4.jpg)
EENG449b/SavvidesLec 4.4
1/22/04
Implementing a MIPS Pipeline
We are developing a subset of the MIPS pipeline supporting
– Load store word– Branch equal zero– Integer ALU Operations
• Remember MIPS has register-register ALU instructions (e.g Add R1, R2, R3)
• Attention: In the homework you will have to redesign the pipeline for register-memory instructions for ALU operations (e.g Add R1,R2,(R3)!!!
![Page 5: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/5.jpg)
EENG449b/SavvidesLec 4.5
1/22/04
MIPS Datapath Review
4;PCNPC
Mem[PC];IR
![Page 6: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/6.jpg)
EENG449b/SavvidesLec 4.6
1/22/04
MIPS Datapath Review
IR; of field immediate extended-singImm
Regs[rt];B
Regs[rs];A
![Page 7: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/7.jpg)
EENG449b/SavvidesLec 4.7
1/22/04
MIPS Datapath Review
0)(ACond
2) (Imm NPC ALUOutput :Branch
or Imm; op AALUOutput :Imm-Reg
or B; func A ALUOutput :ALU Reg-Reg
or Imm; AALUOutput :Ref Memory
![Page 8: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/8.jpg)
EENG449b/SavvidesLec 4.8
1/22/04
MIPS Datapath Review
ALUOutputPC if(cond) :Branch
B;put]Mem[ALUOut
or put];Mem[ALUOutLMD :ref Mem
![Page 9: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/9.jpg)
EENG449b/SavvidesLec 4.9
1/22/04
MIPS Basic Pipeline
Data needs to be written in the registers at the end of each cycle
Depend on instruction type
Load or ALUoperation
LMD
ALUOut
![Page 10: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/10.jpg)
EENG449b/SavvidesLec 4.10
1/22/04
Events at every pipe stage
![Page 11: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/11.jpg)
EENG449b/SavvidesLec 4.11
1/22/04
Events at every pipe stage
![Page 12: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/12.jpg)
EENG449b/SavvidesLec 4.12
1/22/04
Hazards Review
From previous lecture we know the situations that would cause incorrect execution
• Structural Hazards -• Data Hazards -• Control Hazards -
![Page 13: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/13.jpg)
EENG449b/SavvidesLec 4.13
1/22/04
• Read After Write (RAW) InstrJ tries to read operand before InstrI writes it
• Caused by a “Data Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.
Three Generic Data Hazards
I: add r1,r2,r3J: sub r4,r1,r3
![Page 14: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/14.jpg)
EENG449b/SavvidesLec 4.14
1/22/04
• Write After Read (WAR) InstrJ writes operand before InstrI reads it
• Called an “anti-dependence” by compiler writers.This results from reuse of the name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:– All instructions take 5 stages, and– Reads are always in stage 2, and – Writes are always in stage 5
I: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,r1,r7
Three Generic Data Hazards
![Page 15: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/15.jpg)
EENG449b/SavvidesLec 4.15
1/22/04
Three Generic Data Hazards
• Write After Write (WAW) InstrJ writes operand before InstrI writes it.
• Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”.
• Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5
• Will see WAR and WAW in later more complicated pipes
I: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7
![Page 16: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/16.jpg)
EENG449b/SavvidesLec 4.16
1/22/04
MIPS Basic Pipeline
Instruction issued
IF ID EX IF WB
Data Hazards can be detected here
![Page 17: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/17.jpg)
EENG449b/SavvidesLec 4.17
1/22/04
Hardware Hazard Detection
• Figure A.20
![Page 18: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/18.jpg)
EENG449b/SavvidesLec 4.18
1/22/04
Logic to Detect Load Interlocks
• Figure A.21
![Page 19: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/19.jpg)
EENG449b/SavvidesLec 4.19
1/22/04
Forwarding of Results to the ALU
Mem output
ALU output
![Page 20: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/20.jpg)
EENG449b/SavvidesLec 4.20
1/22/04
Control Hazards Revisited
A branch causes a 3-cycle stall in the 5-stage pipeline
Branch Instruction IF ID EX MEM WB
Branch Successor+1 IF stall stall IF ID EX MEM WB
Branch Successor+2 IF ID EX MEM WB
Branch Successor+3 IF ID EX MEM WB
Higher overhead than data hazards…
Can HW changes improve that? YES!• Try to make an early decision whether a branch is taken or not.
![Page 21: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/21.jpg)
EENG449b/SavvidesLec 4.21
1/22/04
Improved Pipeline – Dealing with Branches
Additional adder in ID stageWrite the PC faster
Can detect branch hazard 2 cycles earlier
![Page 22: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/22.jpg)
EENG449b/SavvidesLec 4.22
1/22/04
Improved Pipeline – Dealing with Branches
Additional adder in ID stageWrite the PC faster
Note change of order in text!Figure A.11 says a branch hazard would stall for 1 cycle. This is after the optimization in
Figure A.24!!!Note the change of order…
![Page 23: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/23.jpg)
EENG449b/SavvidesLec 4.23
1/22/04
Reducing Branch Penalties
1. Freeze the pipeline until the outcome of a branch instruction is known
2. Treat every branch as always not-taken • You have to be careful on how to restore the
state of the pipeline back the correct place
3. Treat every branch as taken• May make sense for some machines where the
branch target address is known before the outcome this might make sense
4. Delayed branch• Execute some instructions until the outcome is
known (branch-delay slots)
![Page 24: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/24.jpg)
EENG449b/SavvidesLec 4.24
1/22/04
Branch-Delay Slots
On a machine that needs n cycles before a branch outcome is known:
branch instruction
sequencial successor1 compiler needs to decide
sequencial successor2 on valid and useful successors …………………………………… sequencial successorn
Typically most processors have 1 delay slotLimitations of branch delay:• Restrictions on branch delay instructions• Ability to predict branch outcome at compile time
– Most hardware support nullifying branch – gives the compiler more flexibility. It can schedule the instruction and later on cancel its effects without violating program correctness
![Page 25: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/25.jpg)
EENG449b/SavvidesLec 4.25
1/22/04
Delayed Branch• Where to get instructions to fill branch delay slot?
– Before branch instruction– From the target address: only valuable when branch taken– From fall through: only valuable when branch not taken– Canceling branches allow more slots to be filled
• Compiler effectiveness for single branch delay slot:– Fills about 60% of branch delay slots– About 80% of instructions executed in branch delay slots
useful in computation– About 50% (60% x 80%) of slots usefully filled
• Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)
![Page 26: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/26.jpg)
EENG449b/SavvidesLec 4.26
1/22/04
Scheduling Branch DelayIndependent instruction
Cannot be used
Preferred when branch taken w/ high prob
![Page 27: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/27.jpg)
EENG449b/SavvidesLec 4.27
1/22/04
Performance of Branch Schemes
branches from cycles stall Pipelinedepth Pipeline
speedup Pipeline
1
penalty Branch frequency Branch branches from cycles stall Pipeline
penalty Branchfrequency Branch1depth Pipeline
speedup Pipeline
Assuming an ideal CPI of 1:
![Page 28: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/28.jpg)
EENG449b/SavvidesLec 4.28
1/22/04
Challenges in Pipeline Implementation
Exceptions: Situations that can disrupt the in-order execution of instructions (interrupt, fault, exception)
• I/O device request• Invoking an OS service from a user
program• Breakpoint• Integer arithmetic overflow or FP
arithmetic anomaly• Page fault (not in main memory)• Misaligned memory access etc…
![Page 29: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/29.jpg)
EENG449b/SavvidesLec 4.29
1/22/04
Exceptions Requirements
• Synchronous vs. Asynchronous• User requested vs. coerced• User maskable vs. user non-maskable• With vs. between instructions• Resume vs. terminate
Major challenges:• Exceptions happening within
instructions• Exceptions that need to be restarted –
as in the case of a page fault
![Page 30: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/30.jpg)
EENG449b/SavvidesLec 4.30
1/22/04
MIPS Exceptions
Pipeline State Problem Exceptions
IF Page fault on instruction fetch misaligned memory access memory protection violation
ID Undefined or illegal opcode
EX Arithmetic exception
MEM Page fault on data fetch; misaligned
memory access; memory protection violation
WB None
![Page 31: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d5f5503460f94a3fabc/html5/thumbnails/31.jpg)
EENG449b/SavvidesLec 4.31
1/22/04
What’s next?
Next lecture:– MIPS FP Pipeline & Dynamic Scheduled
Pipelines– An embedded processor architecture: ARM
Lecture 6:– Sensor networks and applications– The connection between architecture and
networks