computer architecture (cs-213) · • how much cycles does first instruction take, in a pipelined...
TRANSCRIPT
![Page 1: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/1.jpg)
Computer Architecture (CS-213)
![Page 2: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/2.jpg)
Main Objectives
• Hazards in Pipelining • Types of Hazards • Data Hazards • Forwarding • Stalling • Control Hazards
![Page 3: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/3.jpg)
Revision
• Do you Know ? • Ideally speed-up (performance factor) from non-
pipelined to pipelined is what ? • What limits the Speed-up ? • How much cycles does first instruction take, in a
pipelined processor, to execute ? • How do we manage control signals in pipelining ?
![Page 4: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/4.jpg)
Pipeline Hazards
• Data Hazards – an instruction uses the result of the previous instruction. A hazard occurs exactly when an instruction tries to read a register in its ID stage that an earlier instruction intends to write in its WB stage.
• Control Hazards – the location of an instruction depends on
previous instruction • Structural Hazards – two instructions need to access the same
resource
![Page 5: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/5.jpg)
5
Data Hazards & forwarding
![Page 6: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/6.jpg)
Data Forwarding
• Take the result from the earliest point that it exists in any of the pipeline state registers and forward it to the functional units (e.g., the ALU) that need it that cycle
• For ALU functional unit: the inputs can come from any pipeline register rather than just from ID/EX by – adding multiplexors to the inputs of the ALU – connecting the Rd write data in EX/MEM or MEM/WB
to either (or both) of the EX’s stage Rs and Rt ALU mux inputs
– adding the proper control hardware to control the new muxes
![Page 7: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/7.jpg)
Data Forwarding Control Conditions 1. EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10
Forwards the result from the previous instr. to either input of the ALU
Forwards the result from the second previous instr. to either input of the ALU
2. MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01
![Page 8: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/8.jpg)
Forwarding Illustration
I n s t r.
O r d e r
add $1,
sub $4,$1,$5
and $6,$7,$1
ALU IM Reg DM Reg
ALU IM Reg DM Reg
ALU IM Reg DM Reg
EX/MEM hazard forwarding
MEM/WB hazard forwarding
![Page 9: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/9.jpg)
I n s t r.
O r d e r
add $1,$1,$2
ALU IM Reg DM Reg
add $1,$1,$3
add $1,$1,$4
ALU IM Reg DM Reg
ALU IM Reg DM Reg
• Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded?
Complication
![Page 10: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/10.jpg)
Corrected Data Forwarding Control Conditions
2. MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRt) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01
![Page 11: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/11.jpg)
12
Avoiding/Detecting Data Hazards
• Add two NOP instructions between sub and add instructions
• Detect a hazard – Ex/Mem.RegisterRd = ID/Exe.RegisterRs – Ex/Mem.RegisterRd = ID/Exe.RegisterRt – Mem/WB.RegisterRd = ID/Exe.RegisterRs – Mem/WB.RegisterRd = ID/Exe.RegisterRt
![Page 12: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/12.jpg)
13
Forwarding Data
![Page 13: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/13.jpg)
14
Implementing the forward unit
![Page 14: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/14.jpg)
15
Control values for forwarding multiplexers
Mux Control Source Explanation
ForwardA =00 ID/EX 1st ALU operand comes from register file
ForwardA= 10 EX/MEM 1st ALU operand is forwarded from prior ALU result
ForwardA= 01 MEM/WB 1st ALU operand is forwarded from data memory or an earlier ALU result
![Page 15: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/15.jpg)
16
Control values for forwarding multiplexers
Mux Control Source Explanation
ForwardB =00 ID/EX 2nd ALU operand comes from register file
ForwardB= 10 EX/MEM 2nd ALU operand is forwarded from prior ALU result
ForwardB= 01 MEM/WB 2nd ALU operand is forwarded from data memory or an earlier ALU result
![Page 16: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/16.jpg)
17
Modified Datapath for Forwarding
![Page 17: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/17.jpg)
18
Load Word Hazard and Stall
![Page 18: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/18.jpg)
19
Load Word Hazard and Stall
![Page 19: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/19.jpg)
20
Hazard Detection Unit
![Page 20: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/20.jpg)
21
Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd
Software Scheduling to Avoid Load Hazards
Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd
![Page 21: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/21.jpg)
Control Hazards • Branch instructions can cause great performance loss • Branch instructions need two things:
– The result of branch: Taken or Not Taken – Branch target:
• PC + 4 Branch NOT taken • PC + 4 + immediate*4 Branch Taken
• Branch instruction is not detected until the ID stage – At which point a new instruction has already been fetched
• For our original pipeline: – Effective address is not calculated until EX stage – Branch condition get set in the EX/MEM register (EX/MEM.zero) – 3-cycle branch delay
![Page 22: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/22.jpg)
M
IF/ID
Branch Delay – CC1 • Consider the pipelined execution of: beq $1, $3, 100 • During the first cycle, beq is fetched in the IF stage
Rs
Rt
Imm16 Extend
PC =
100
0
Registers
A d d
Dat
a_in
ALU result
ID EX beq $1, $3, 100 MEM
ID/EX EX/MEM
Reg
_dst
1004
Instruction Memory
Address
Instruction
+4
m u x
0
1 A L U m
u x
m u x
m u x
Writeback data
Zero
Op
Main Control M
W
E
W
PCSrc
![Page 23: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/23.jpg)
M
Branch Delay – CC2 • During the second cycle, beq is decoded in the ID stage • The next_1 instruction is fetched in the IF stage
Extend
Registers
Rs
Rt
Imm16
PC =
100
4
A d d
Dat
a_in
ALU result
EX next_1 MEM
ID/EX EX/MEM
Reg
_dst
Instruction Memory
Address
Instruction
1008
+4
m u x
0
1 A L U m
u x
m u x
m u x
Writeback data
Zero
Main Control M
W
E
W
PCSrc
beq $1, $3, 100
IF/ID
beq
$3
$1
100
1004
![Page 24: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/24.jpg)
Imm16
M
Branch Delay – CC3 • During the third cycle, beq is executed in the EX stage • The next_2 instruction is fetched in the IF stage
Extend
Registers
Rs
Rt PC =
100
8
Dat
a_in
ALU result
beq $1, $3, 100 next_2 MEM
EX/MEM
Reg
_dst
Instruction Memory
Address
Instruction
1012
+4
m u x
0
1
Writeback data
Zero
Main Control M
W
E
W
PCSrc
next_1
IF/ID
1008
ID/EX
1004
12
34
1234
10
0
A d d
A L U m
u x
m u x
m u x
Beq = 1
![Page 25: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/25.jpg)
Imm16
Instruction Memory
Address
Instruction
1016
Branch Delay – CC4 • During the fourth cycle, beq reaches MEM stage • The next_3 instruction is fetched in the IF stage
Extend
Registers
Rs
Rt PC =
101
2
Dat
a_in
ALU result
next_1 next_3 beq $1, $3, 100
Reg
_dst
Writeback data
Main Control M
W
E
M
W
Zero = 1
Beq = 1
next_2
IF/ID
1012
ID/EX
1008
A d d
A L U m
u x
m u x
m u x
EX/MEM
1404
0
1
PCSrc
m u x
0
1
+4
![Page 26: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/26.jpg)
Imm16
Instruction Memory
Address
Instruction
1408
Branch Delay – CC5 • During the fifth cycle, branch_target instruction is fetched • Next_1 thru next_3 should be converted into NOPs
Extend
Registers
Rs
Rt PC =
140
4
Dat
a_in
ALU result
next_2 branch_target next_1
Reg
_dst
Writeback data
Main Control M
W
E
M
W
Zero
next_3
IF/ID
1016
ID/EX
1012
A d d
A L U m
u x
m u x
m u x
EX/MEM
PCSrc
m u x
0
1
+4
![Page 27: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/27.jpg)
3-Cycle Branch Delay
beq $1,$3,100
Next_1 // bubble
ALU
DM
IM Reg
Reg
cc1 cc2 cc3 cc4 cc5 cc6
IM Reg
IM Reg
IM Reg
Bubble
Bubble
Bubble
Next_2 // bubble
Next_3 // bubble
Branch_Target
Bubble
Bubble
Bubble
ALU
IM Reg
Bubble
Bubble
cc7
• Next_1 thru Next_3 will be fetched anyway • Pipeline should flush Next_1 thru Next_3 if branch is
taken • Otherwise, they can be executed normally
![Page 28: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/28.jpg)
Reducing the Delay of Branches • Branch delay can be reduced from 3 cycles to just 1 cycle • Branch decision is moved from 4th into 2nd pipeline stage
– Branches can be determined earlier in the ID stage – Branch address calculation adder is moved to ID stage – A comparator in the ID stage to compare the two fetched registers
• To determine branch decision, whether the branch is taken or not
• Only one instruction that follows the branch will be fetched • If the branch is taken then only one instruction is flushed • We need a control signal IF.Flush to zero the IF/ID register
– This will convert the fetched instruction into a NOP, given we need to branch to target address.
![Page 29: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/29.jpg)
PC Instruction memory
4
Registers
M u x
M u x
M u x
ALU
EX
M
WB
M
WB
WB
ID/EX
0
EX/MEM
MEM/WB
Data memory
M u x
Hazard detection
unit
Forwarding unit
IF.Flush
IF/ID
Sign extend
Control
M u x
=
Shift left 2
M u x
Reducing the Delay of Branches
![Page 30: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/30.jpg)
Branch Hazard Alternatives • Always stall the pipeline until branch direction is known
– Next instruction is always flushed (turned into a NOP) • Predict Branch Not Taken
– Fetch successor instruction: PC+4 already calculated – Almost half of MIPS branches are not taken on average – Flush instructions in pipeline only if branch is actually taken
• Predict Branch Taken – Can predict backward branches in loops which are taken most
of the time – However, branch target address is determined in ID stage so we
Must reduce branch delay from 1 cycle to 0, but how? • Delayed Branch
– Define branch to take place AFTER the following instruction
![Page 31: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/31.jpg)
• Define branch to take place after the next instruction • For a 1-cycle branch delay, we have one delay slot branch instruction branch delay slot – next instruction . . . branch target – if branch taken
• Compiler/assembler fills the branch delay slot By selecting a useful instruction, which must be executed irrelevant of the result of branch.
branch instruction (taken) IF ID EX MEM WB
branch delay slot (next instruction) IF ID EX MEM WB
branch target IF ID EX MEM WB
Delayed Branch
![Page 32: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/32.jpg)
Scheduling the Branch Delay Slot
1. From an independent instruction before the branch
2. From a target instruction when branch is predicted taken
3. From fall through when branch is predicted not taken
add $t2,$t3,$t4
beq $s1, $s0
Delay Slot
beq $s1, $s0
add $t2,$t3,$t4
sub $t4,$t5,$t6
beq $s1, $s0
Delay Slot
beq $s1, $s0
sub $t4,$t5,$t6
Delay Slot
beq $s1, $s0
Sub $t4,$t5,$t6
From
Bef
ore
From
Tar
get
From
Fal
l Thr
ough
beq $s1, $s0
sub $t4,$t5,$t6
![Page 33: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/33.jpg)
More on Delayed Branch • Scheduling delay slot with
– Independent instruction is the best choice • However, not always possible to find an independent instruction
– Target instruction is useful when branch is predicted taken • Such as in a loop branch (e.g., for loop) • Cancel branch delay instruction if branch is not taken
– Fall through is useful when branch is predicted not taken • Cancel branch delay instruction if branch is taken
• Disadvantages of delayed branch – Branch delay can increase to multiple cycles in deeper pipelines – Zero-delay branching + dynamic branch prediction are the most
appropriate solution to the problem
![Page 34: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/34.jpg)
Zero-Delayed Branch • How can we achieve zero-delay for a taken branch …
– If the branch target address is computed in the IF stage ?
• Solution – Check the PC to see if the instruction being fetched is a branch – Store the branch target address in a table in the IF stage – Such a table is called the branch target buffer – If branch is predicted taken then
• Next PC = branch target fetched from target buffer
– Otherwise, if branch is predicted not taken then • Next PC = PC + 4
– Zero-delay is achieved because Next PC is determined in IF stage
![Page 35: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/35.jpg)
Branch Target and Prediction Buffer
• The branch target buffer is implemented as a small cache – That stores the branch target address of taken branches
• We also have a branch prediction buffer, It determines whether we need to take the branch or not. – To store the prediction bits for branch instructions – The prediction bits are dynamically determined by the hardware
PC
mux
+4
Pred
ictio
n Bu
ffer
Branch Target Buffer
PC of Branch Target Address
Lookup
![Page 36: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/36.jpg)
Dynamic Branch Prediction • Prediction of branches at runtime using prediction bits
– One or few prediction bits are associated with a branch instruction • Branch prediction buffer is a small memory
– Indexed by the lower portion of the address of branch instruction • The simplest scheme is to have 1 prediction bit per branch • We don’t know if the prediction bit is correct or not • If correct prediction …
– Continue normal execution – no wasted cycles • If incorrect prediction (misprediction) …
– Flush the instructions that were incorrectly fetched – wasted cycles – Update prediction bit and target address for future use
![Page 37: Computer Architecture (CS-213) · • How much cycles does first instruction take, in a pipelined processor, to execute ? • How do we manage control signals in pipelining ? Pipeline](https://reader033.vdocument.in/reader033/viewer/2022050304/5f6cae1710e1b4576b28ca53/html5/thumbnails/37.jpg)
• Prediction is just a hint that is assumed to be correct • If incorrect then fetched instructions are flushed • 1-bit prediction scheme has a performance shortcoming
– A loop branch is almost always taken, except for last iteration – 1-bit scheme will predict incorrectly twice, rather than once – On the first and last loop iterations
• 2-bit prediction schemes are often used – A prediction must be wrong twice before it is changed – A loop branch is mispredicted only once on the last iteration
2-bit Prediction Scheme
Not Taken
Predict Taken
Predict Taken
Not Taken
Taken
Taken
Taken
Taken
Not Taken
Not Taken
Not Taken
Not Taken