Download - CSL718 : Superscalar Processors
![Page 1: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/1.jpg)
Anshul Kumar, CSE IITD
CSL718 : Superscalar CSL718 : Superscalar ProcessorsProcessors
CSL718 : Superscalar CSL718 : Superscalar ProcessorsProcessors
Speculative Execution
2nd Feb, 2006
![Page 2: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/2.jpg)
Anshul Kumar, CSE IITD slide 2
Handling Control DependenceHandling Control DependenceHandling Control DependenceHandling Control Dependence
• Simple pipeline– Branch prediction reduces stalls due to control
dependence
• Wide issue processor– Mere branch prediction is not sufficient– Instructions in the predicted path need to be
fetched and EXECUTED (speculated execution)
![Page 3: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/3.jpg)
Anshul Kumar, CSE IITD slide 3
What is required for speculation?What is required for speculation?What is required for speculation?What is required for speculation?
• Branch prediction to choose which instructions to execute
• Execution of instructions before control dependences are resolved
• Ability to undo the effects of incorrectly speculated sequence
• Preserving of correct behaviour under exceptions
![Page 4: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/4.jpg)
Anshul Kumar, CSE IITD slide 4
Types of speculationTypes of speculationTypes of speculationTypes of speculation
• Hardware based speculation– done with dynamic branch prediction and
dynamic scheduling– used in Superscalar processors
• Compiler based speculation– done with static branch prediction and static
scheduling– used in VLIW processors
![Page 5: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/5.jpg)
Anshul Kumar, CSE IITD slide 5
Extending Tomasulo’s scheme for Extending Tomasulo’s scheme for speculative executionspeculative execution
Extending Tomasulo’s scheme for Extending Tomasulo’s scheme for speculative executionspeculative execution
• Introduce re-order buffer (ROB)
• Add another stage – “commit”
Normal execution• Issue• Execute• Write result
Speculative execution• Issue• Execute• Write result• Commit
f xfx
i i xx
![Page 6: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/6.jpg)
Anshul Kumar, CSE IITD slide 6
Extending Tomasulo’s scheme for Extending Tomasulo’s scheme for speculative execution – contd.speculative execution – contd.
Extending Tomasulo’s scheme for Extending Tomasulo’s scheme for speculative execution – contd.speculative execution – contd.
• Write results into ROB in the “write result” stage• Write results into register file or memory in the
“commit” stage• Dependent instructions can read operands from
ROB• A speculative instruction commits only if the
prediction is determined to be correct• Instructions may complete execution out-of-order,
but they commit in-order
![Page 7: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/7.jpg)
Anshul Kumar, CSE IITD slide 7
Recall Tomasulo’s scheme ......
![Page 8: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/8.jpg)
Anshul Kumar, CSE IITD slide 8
IssueIssueIssueIssue
• Get next instruction from instruction queue• Check if there is a matching RS which is
empty– no: structural hazard, instruction stalls– yes: issue the instruction to that RS
• For each operand, check if it is available in RF– yes: put the operand in the RS– no: keep track of FU that will produce it
![Page 9: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/9.jpg)
Anshul Kumar, CSE IITD slide 9
ExecuteExecuteExecuteExecute
• If one or more operands not available, wait and monitor CDB
• When an operand becomes available, it is placed in RS
• When all operands are available, start execution
• Choice may need to be made if multiple instructions become ready at the same time
![Page 10: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/10.jpg)
Anshul Kumar, CSE IITD slide 10
Write resultWrite resultWrite resultWrite result
• When result is available– write it on CDB and – from there into RF and relevant RSs
• Mark RS as available
![Page 11: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/11.jpg)
Anshul Kumar, CSE IITD slide 11
More formal description ......
![Page 12: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/12.jpg)
Anshul Kumar, CSE IITD slide 12
RS and RF fieldsRS and RF fieldsRS and RF fieldsRS and RF fields
op busy Qj Vj Qk Vk val Qi
![Page 13: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/13.jpg)
Anshul Kumar, CSE IITD slide 13
IssueIssueIssueIssue
• Get instruction <op, rd, rs, rt> from instruction queue
• Wait until r RS[r].busy = no• if (RF[rs].Qi 0)
{RS[r].Qj RF[rs].Qi}else {RS[r].Vj RF[rs].val; RS[r].Qj 0}
• similarly for rt• RS[r].op op; RS[r].busy yes;
RF[rd].Qi r
![Page 14: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/14.jpg)
Anshul Kumar, CSE IITD slide 14
ExecuteExecuteExecuteExecute
• Wait until RS[r].Qj = 0 and RS[r].Qk = 0
• Compute result: operation is RS[r].op, operands are RS[r].Vj and RS[r].Vk
![Page 15: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/15.jpg)
Anshul Kumar, CSE IITD slide 15
Write resultWrite resultWrite resultWrite result
• Wait until execution complete at r and CDB available
x if (RF[x].Qi = r)
{RF[x].val result; RF[x].Qi 0} x if (RS[x].Qj = r)
{RS[x].Vj result; RS[x].Qj 0}
• similarly for Qk / Vk
• RS[r].busy no
![Page 16: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/16.jpg)
Anshul Kumar, CSE IITD slide 16
Tomasulo’s scheme plus ROB......
![Page 17: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/17.jpg)
Anshul Kumar, CSE IITD slide 17
IssueIssueIssueIssue
• Get next instruction from instruction queue• Check if there is a matching RS which is empty
and an empty slot in ROB– no: structural hazard, instruction stalls
– yes: issue the instruction to that RS and mark the ROB slot, also put ROB slot number in RS
• For each operand, check if it is available in RF or ROB– yes: put the operand in the RS
– no: keep track of FU that will produce it
![Page 18: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/18.jpg)
Anshul Kumar, CSE IITD slide 18
Execute (no change)Execute (no change)Execute (no change)Execute (no change)
• If one or more operands not available, wait and monitor CDB
• When an operand becomes available, it is placed in RS
• When all operands are available, start execution
• Choice may need to be made if multiple instructions become ready at the same time
![Page 19: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/19.jpg)
Anshul Kumar, CSE IITD slide 19
Write resultWrite resultWrite resultWrite result
• When result is available– write it on CDB with ROB tag and – from there into ROB RF and relevant RSs
• Mark RS as available
![Page 20: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/20.jpg)
Anshul Kumar, CSE IITD slide 20
Commit (non-branch instruction)Commit (non-branch instruction)Commit (non-branch instruction)Commit (non-branch instruction)
• Wait until instruction reaches head of ROB
• Update RF
• Remove instruction from ROB
![Page 21: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/21.jpg)
Anshul Kumar, CSE IITD slide 21
Commit (branch instruction)Commit (branch instruction)Commit (branch instruction)Commit (branch instruction)
• Wait until instruction reaches head of ROB
• If branch is mispredicted, – flush ROB– Restart execution at correct successor of the
branch instruction
• else– Remove instruction from ROB
![Page 22: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/22.jpg)
Anshul Kumar, CSE IITD slide 22
More formal description ......
![Page 23: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/23.jpg)
Anshul Kumar, CSE IITD slide 23
RS fieldsRS fieldsRS fieldsRS fields
op busy Qi Qj Vj Qk Vk
![Page 24: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/24.jpg)
Anshul Kumar, CSE IITD slide 24
RF fieldsRF fieldsRF fieldsRF fields
val Qi busy
![Page 25: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/25.jpg)
Anshul Kumar, CSE IITD slide 25
ROB fieldsROB fieldsROB fieldsROB fields
inst busy rdy val dst
![Page 26: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/26.jpg)
Anshul Kumar, CSE IITD slide 26
IssueIssueIssueIssue• Get instruction <op, rd, rs, rt> from instruction queue• Wait until r RS[r].busy=no and
ROB[b].busy=no, where b = ROB tail• if (RF[rs].busy) {h RF[rs].Qi;
if (ROB[h].rdy) {RS[r].Vj ROB[h].val; RS[r].Qj 0}else {RS[r].Qj h}
} else {RS[r].Vj RF[rs].val; RS[r].Qj 0}
• similarly for rt• RS[r].op op; RS[r].busy yes; RS[r].Qi b• RF[rd].Qi b; RF[rd].busy yes; ROB[b].busy yes• ROB[b].inst op; ROB[b].dst rd; ROB[b].rdy no
![Page 27: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/27.jpg)
Anshul Kumar, CSE IITD slide 27
Execute (no change)Execute (no change)Execute (no change)Execute (no change)
• Wait until RS[r].Qj = 0 and RS[r].Qk = 0
• Compute result: operation is RS[r].op, operands are RS[r].Vj and RS[r].Vk
![Page 28: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/28.jpg)
Anshul Kumar, CSE IITD slide 28
Write resultWrite resultWrite resultWrite result• Wait until execution complete at r and CDB
available
• b RS[r].Qi; RS[r].busy no x if (RF[x].Qi = r)
{RF[x] result; RF[x].Qi 0} x if (RS[x].Qj = b)
{RS[x].Vj result; RS[x].Qj 0}
• similarly for Qk / Vk
• ROB[b].rdy yes; ROB[b].val result
![Page 29: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/29.jpg)
Anshul Kumar, CSE IITD slide 29
Commit (non-branch instruction)Commit (non-branch instruction)Commit (non-branch instruction)Commit (non-branch instruction)
• Wait until instruction reaches head of ROB (entry h) and ROB[h].rdy = yes
• d ROB[h].dst
• RF[d].val ROB[h].val
• ROB[h].busy no
• if (RF[d].Qi = h) {RF[d].busy no}
![Page 30: CSL718 : Superscalar Processors](https://reader034.vdocument.in/reader034/viewer/2022051402/56815a93550346895dc80d1a/html5/thumbnails/30.jpg)
Anshul Kumar, CSE IITD slide 30
Commit (branch instruction)Commit (branch instruction)Commit (branch instruction)Commit (branch instruction)
• Wait until instruction reaches head of ROB (entry h) and ROB[h].rdy = yes
• If branch is mispredicted, – clear ROB, RF[ ].Qi– fetch branch dest
• else– ROB[h].busy no– if (RF[d].Qi = h) {RF[d].busy no}