--dynamic scheduling (ooo) via tomasulo’s approach · §tomasulo’s approach ... tomasulo...
TRANSCRIPT
![Page 1: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/1.jpg)
Lecture 16: Instruction Level Parallelism-- Dynamic Scheduling (OOO) via Tomasulo’s
Approach
CSCE 513 Computer Architecture
Department of Computer Science and EngineeringYonghong Yan
[email protected]://passlab.github.io/CSCE513
![Page 2: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/2.jpg)
2
Topics for Instruction Level Parallelism§ 5-stage Pipeline Extension, ILP Introduction, Compiler
Techniques, and Branch Prediction– C.5, C.6– 3.1, 3.2– Branch Prediction, C.2, 3.3
§ Dynamic Scheduling (OOO)– 3.4, 3.5
§ Hardware Speculation and Static Superscalar/VLIW– 3.6, 3.7
§ Dynamic Superscalar, Advanced Techniques, ARM Cortex-A53, and Intel Core i7– 3.8, 3.9, 3.12
§ SMT: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput – 3.11
2
![Page 3: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/3.jpg)
3
Acknowledge and Copyright§ Slides adapted from
– UC Berkeley course “Computer Science 252: Graduate Computer Architecture” of David E. Culler Copyright(C) 2005 UCB
– UC Berkeley course Computer Science 252, Graduate Computer Architecture Spring 2012 of John KubiatowiczCopyright(C) 2012 UCB
– Computer Science 152: Computer Architecture and Engineering, Spring 2016 by Dr. George Michelogiannakisfrom UC Berkeley
§ https://passlab.github.io/CSCE513/copyrightack.html
![Page 4: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/4.jpg)
4
3.4 Overcoming Data Hazards With Dynamic Scheduling
§ Data Hazards§ Control Hazards
![Page 5: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/5.jpg)
5
Types of Data Hazards Considerexecutingasequenceof
rk <=ri oprjtypeofinstructions
Data-dependencer3 <=r1 opr2 Read-after-Writer5 <=r3 opr4 (RAW)hazard
Anti-dependencer3 <=r1 opr2 Write-after-Readr1 <=r4 opr5 (WAR)hazard
Output-dependencer3 <=r1 opr2 Write-after-Writer3 <=r6 opr7 (WAW)hazard
![Page 6: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/6.jpg)
6
Register vs. Memory DependenceData hazards due to register operands can be determined at the decode stage, but data hazards due to memory operands can be determined only after computing the effective address
Store: M[r1 + disp1] <= r2 Load: r3 <= M[r4 + disp2]
Does (r1 + disp1) = (r4 + disp2) ?
![Page 7: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/7.jpg)
7
Data Hazards: An ExampleI1 FDIV.D f6, f6, f4
I2 FLD f2, 45(x3)
I3 FMUL.D f0, f2, f4
I4 FDIV.D f8, f6, f2
I5 FSUB.D f10, f0, f6
I6 FADD.D f6, f8, f2
RAWHazardsWARHazardsWAWHazards
![Page 8: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/8.jpg)
8
Instruction Scheduling
I6
I2
I4
I1
I5
I3
Validorderings:in-order I1 I2 I3 I4 I5 I6out-of-order
out-of-order
I1 FDIV.D f6, f6, f4
I2 FLD f2, 45(x3)
I3 FMULT.D f0, f2, f4
I4 FDIV.D f8, f6, f2
I5 FSUB.D f10, f0, f6
I6 FADD.D f6, f8, f2
I2 I1 I3 I4 I5 I6I1 I2 I3 I5 I4 I6
![Page 9: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/9.jpg)
9
Out-of-order CompletionIn-order Issue
LatencyI1 FDIV.D f6, f6, f4 4
I2 FLD f2, 45(x3) 1
I3 FMULT.D f0, f2, f4 3
I4 FDIV.D f8, f6, f2 4
I5 FSUB.D f10, f0, f6 1
I6 FADD.D f6, f8, f2 1
in-ordercomp 12
out-of-ordercomp12
1 2 343 54 65 6
2 31 43 55 4 66
Underlines are completes
![Page 10: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/10.jpg)
10
Register Renaming to Eliminate WAR and WAW Hazards
§ Example:
DIV.D F0,F2,F4ADD.D F6,F0,F8S.D F6,0(R1)SUB.D F8,F10,F14MUL.D F6,F10,F8
+ name dependence with F6 and F8No real data sharing between two instructions.
Anti-dependence (WAR) on F8
Also on F6 between S.D and MUL.D
Output dependence (WAW) on F6
![Page 11: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/11.jpg)
11
Register Renaming§ Example:
DIV.D F0,F2,F4ADD.D F6,F0,F8S.D F6,0(R1)SUB.D T2,F10,F14MUL.D T1,F10,T2
§ Now only RAW hazards remain, which can be strictly ordered
DIV.D F0,F2,F4
ADD.D F6,F0,F8
S.D F6,0(R1)
SUB.D F8,F10,F14
MUL.D F6,F10,F8
![Page 12: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/12.jpg)
12
Dynamic Scheduling§ Rearrange order of instructions to reduce stalls while
maintaining data flow– Minimize impacts by RAW Hazards– Minimize WAW and WAR hazards via Register Renaming– Between registers and memory hazards
§ Advantages:– Compiler doesn’t need to have knowledge of
microarchitecture– Handles cases where dependencies are unknown at compile
time§ Disadvantages:
– Substantial increase in hardware complexity– Complicates exceptions
![Page 13: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/13.jpg)
13
Dynamic Scheduling§ Dynamic scheduling implies:–Out-of-order execution–Out-of-order completion
§ May create more WAR and WAW hazards– Since execution are out-of-order
§ Scoreboard: C.6– CDC6600 in 1963
§ Tomasulo’s Approach– Tracks when operands are available– Introduces register renaming in hardware
» Minimizes WAW and WAR hazards
![Page 14: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/14.jpg)
14
Scoreboard Architecture (CDC 6600)
Func
tion
al U
nits
Regist
ers
FP MultFP Mult
FP Divide
FP Add
Integer
MemorySCOREBOARD
![Page 15: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/15.jpg)
15
Tomasulo Algorithm§ For IBM 360/91 about 3 years after CDC 6600 (1966)§ Goal: High Performance without special compilers§ Differences between IBM 360 & CDC 6600 ISA
– IBM has only 2 register specifiers/instr vs. 3 in CDC 6600– IBM has 4 FP registers vs. 8 in CDC 6600– IBM has memory-register ops
§ Why Study? lead to Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604, …, Today’s processors
![Page 16: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/16.jpg)
16
Organizations of Tomasulo’s Algorithm§ Load/Store buffer§ Reservation station§ Common data bus
v
![Page 17: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/17.jpg)
17
Tomasulo Algorithm vs. Scoreboard§ Control & buffers distributed with Function Units (FU)
vs. centralized in scoreboard; – FU buffers called “reservation stations”; have pending
operands§ Registers in instructions replaced by values or
pointers to reservation stations(RS); à registerrenaming ; – Avoids WAR, WAW hazards– More reservation stations than registers, so can do
optimizations compilers can’t§ Results to FU from RS, not through registers, over
Common Data Bus that broadcasts results to all FUs§ Load and Stores treated as FUs with RSs as well§ Integer instructions can go past branches, allowing
FP ops beyond basic block in FP queue
![Page 18: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/18.jpg)
18
Register Renaming§ Register renaming by reservation stations (RS)
– Each entry contains:» The instruction» Buffered operand values (when available)» Reservation station number of instruction providing the operand
values– RS fetches and buffers an operand as soon as it becomes
available (not necessarily involving register file)– Pending instructions designate the RS to which they will send
their output» Result values broadcast on the common data bus (CDB)
– Only the last output updates the register file– As instructions are issued, the register specifiers are
renamed with the reservation station– May be more reservation stations than registers
![Page 19: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/19.jpg)
19
Reservation Station ComponentsOp: Operation to perform in the unit (e.g., + or –)Vj, Vk: Value of Source operands– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing source registers (value to be written)– Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready in Vj
or Vk– Store buffers only have Qi for RS producing result
Busy: Indicates reservation station or FU is busy
Qi: Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.
![Page 20: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/20.jpg)
20
Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op Queue
If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).
2. Execution—operate on operands (EX)When both operands ready then execute;if not ready, watch Common Data Bus for result
3. Write result—finish execution (WB)Write on Common Data Bus to all awaiting units; mark reservation station available
§ Normal data bus: data + destination (“go to” bus)§ Common data bus: data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source address– Write if matches expected Functional Unit (produces result)– Does the broadcast
![Page 21: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/21.jpg)
21
Tomasulo Organization for the Example
FP adders
Add1Add2Add3
FP multipliers
Mult1Mult2
From Mem FP Registers
Reservation Stations
Common Data Bus (CDB)
To Mem
FP OpQueue
Load Buffers
Store Buffers
Load1Load2Load3Load4Load5Load6
![Page 22: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/22.jpg)
22
Tomasulo Example
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU
![Page 23: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/23.jpg)
23
Tomasulo Example Cycle 1
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1
![Page 24: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/24.jpg)
24
Tomasulo Example Cycle 2
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1
Allow multiple outstanding loads
![Page 25: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/25.jpg)
25
Tomasulo Example Cycle 3
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1
• Note: registers names are removed (“renamed”) in Reservation Stations; MULT issued vs. scoreboard
• Load1 completing; what is waiting for Load1?
![Page 26: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/26.jpg)
26
Tomasulo Example Cycle 4
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(A1) Add1
• Load2 completing; what is waiting for Load2?
Waiting for data from memory by the instruction originally in Load1
![Page 27: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/27.jpg)
27
Tomasulo Example Cycle 5
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No
10 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 M(A2) M(A1) Add1 Mult2
Waiting for data from memory by the instruction originally in Load2
![Page 28: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/28.jpg)
28
Tomasulo Example Cycle 6
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No
9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(A2) Add2 Add1 Mult2
• Issue ADDD here vs. scoreboard?
![Page 29: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/29.jpg)
29
Tomasulo Example Cycle 7
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No
8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(A2) Add2 Add1 Mult2
• Add1 completing; what is waiting for it?
![Page 30: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/30.jpg)
30
Tomasulo Example Cycle 8
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No2 Add2 Yes ADDD (M-M) M(A2)
Add3 No7 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(A2) Add2 (M-M) Mult2
![Page 31: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/31.jpg)
31
Tomasulo Example Cycle 9
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No1 Add2 Yes ADDD (M-M) M(A2)
Add3 No6 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(A2) Add2 (M-M) Mult2
![Page 32: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/32.jpg)
32
Tomasulo Example Cycle 10
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No0 Add2 Yes ADDD (M-M) M(A2)
Add3 No5 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 M(A2) Add2 (M-M) Mult2
• Add2 completing; what is waiting for it?
![Page 33: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/33.jpg)
33
Tomasulo Example Cycle 11
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
4 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
• Write result of ADDD here vs. scoreboard?• All quick instructions complete in this cycle!
![Page 34: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/34.jpg)
34
Tomasulo Example Cycle 12
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
3 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
![Page 35: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/35.jpg)
35
Tomasulo Example Cycle 13
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
2 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
![Page 36: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/36.jpg)
36
Tomasulo Example Cycle 14
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
1 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
![Page 37: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/37.jpg)
37
Tomasulo Example Cycle 15
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
0 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
![Page 38: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/38.jpg)
38
Tomasulo Example Cycle 16
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
40 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
![Page 39: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/39.jpg)
39
Faster than light computation(skip a couple of cycles)
![Page 40: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/40.jpg)
40
Tomasulo Example Cycle 55
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
1 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3055 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
![Page 41: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/41.jpg)
41
Tomasulo Example Cycle 56
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
0 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
• Mult2 is completing; what is waiting for it?
![Page 42: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/42.jpg)
42
Tomasulo Example Cycle 57Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Result
• Once again: In-order issue, out-of-order execution and completion.
![Page 43: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/43.jpg)
43
Compare to Scoreboard Cycle 62
Instruction status: Read Exec Write Exec WriteInstruction j k Issue Oper Comp Result Issue ComplResultLD F6 34+ R2 1 2 3 4 1 3 4LD F2 45+ R3 5 6 7 8 2 4 5MULTD F0 F2 F4 6 9 19 20 3 15 16SUBD F8 F6 F2 7 9 11 12 4 7 8DIVD F10 F0 F6 8 21 61 62 5 56 57ADDD F6 F8 F2 13 14 16 22 6 10 11
• Why take longer on scoreboard/6600?• Structural Hazards• Lack of forwarding
![Page 44: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/44.jpg)
44
Tomasulo v. Scoreboard(IBM 360/91 v. CDC 6600)
Pipelined Functional Units Multiple Functional Units(6 load, 3 store, 3 +, 2 x/÷) (1 load/store, 1 + , 2 x, 1 ÷)
window size: ≤ 14 instructions ≤ 5 instructions
No issue on structural hazard sameWAR: renaming avoids stall completionWAW: renaming avoids stall issue
Broadcast results from FU Write/read registersControl: reservation stations central scoreboard
![Page 45: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/45.jpg)
45
SUMMARY
![Page 46: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/46.jpg)
46
Hardware Solution§ Dynamic Scheduling
– Out-of-order execution and completion§ Data Hazard via Register Renaming
– Dynamic RAW hazard detection and scheduling in data-flow fashion
– Register renaming for WRA and WRA hazard (name conflict)
§ Implementations– Scoreboard (CDC 6600 1963)
» Centralized register renaming– Tomasulo’s Approach (IBM 360/91, 1966)
» Distributed control and renaming via reservation station, load/store buffer and common data bus (data+source)
![Page 47: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/47.jpg)
47
Register Renaming Summary§ Purpose of Renaming: removing “Anti-dependencies”
– Get rid of WAR and WAW hazards, since these are not “real” dependencies§ Implicit Renaming: i.e. Tomasulo
– Registers changed into values or response tags– We call this “implicit” because space in register file may or may not be used
by results!§ Explicit Renaming: more physical registers than needed by ISA.
– Rename table: tracks current association between architectural registers and physical registers
– Uses a translation table to perform compiler-like transformation on the fly
§ With Explicit Renaming:– All registers concentrated in single register file– Can utilize bypass network that looks more like 5-stage pipeline– Introduces a register-allocation problem
» Need to handle branch misprediction and precise exceptions differently, but ultimately makes things simpler
![Page 48: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/48.jpg)
48
Class Lecture Stops Here
![Page 49: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/49.jpg)
49
TOMASULA LOOP EXAMPLE
![Page 50: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/50.jpg)
50
Tomasulo Loop ExampleLoop: LD F0 0 R1
MULTD F4 F0 F2SD F4 0 R1SUBI R1 R1 #8BNEZ R1 Loop
§ Assume Multiply takes 4 clocks§ Assume first load takes 8 clocks (cache miss), second load
takes 1 clock (hit)§ To be clear, will show clocks for SUBI, BNEZ§ Reality: integer instructions ahead
![Page 51: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/51.jpg)
51
Loop ExampleInstruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
0 80 Fu
![Page 52: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/52.jpg)
52
Loop Example Cycle 1Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
1 80 Fu Load1
![Page 53: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/53.jpg)
53
Loop Example Cycle 2Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
2 80 Fu Load1 Mult1
![Page 54: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/54.jpg)
54
Loop Example Cycle 3
§ Implicit renaming sets up “DataFlow” graph
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
3 80 Fu Load1 Mult1
![Page 55: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/55.jpg)
55
Loop Example Cycle 4
§ Dispatching SUBI Instruction
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
4 80 Fu Load1 Mult1
![Page 56: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/56.jpg)
56
Loop Example Cycle 5
§ And, BNEZ instruction
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
5 72 Fu Load1 Mult1
![Page 57: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/57.jpg)
57
Loop Example Cycle 6
§ Notice that F0 never sees Load from location 80
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
6 72 Fu Load2 Mult1
![Page 58: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/58.jpg)
58
Loop Example Cycle 7
§ Register file completely detached from computation§ First and Second iteration completely overlapped
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
7 72 Fu Load2 Mult2
![Page 59: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/59.jpg)
59
Loop Example Cycle 8Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
8 72 Fu Load2 Mult2
![Page 60: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/60.jpg)
60
Loop Example Cycle 9
§ Load1 completing: who is waiting?§ Note: Dispatching SUBI
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
9 72 Fu Load2 Mult2
![Page 61: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/61.jpg)
61
Loop Example Cycle 10
§ Load2 completing: who is waiting?§ Note: Dispatching BNEZ
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 10 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
4 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
10 64 Fu Load2 Mult2
![Page 62: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/62.jpg)
62
Loop Example Cycle 11
§ Next load in sequence
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
3 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #84 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
11 64 Fu Load3 Mult2
![Page 63: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/63.jpg)
63
Loop Example Cycle 12
§ Why not issue third multiply?
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
2 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #83 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
12 64 Fu Load3 Mult2
![Page 64: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/64.jpg)
64
Loop Example Cycle 13Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
1 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #82 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
13 64 Fu Load3 Mult2
![Page 65: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/65.jpg)
65
Loop Example Cycle 14
§ Mult1 completing. Who is waiting?
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
0 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #81 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
14 64 Fu Load3 Mult2
![Page 66: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/66.jpg)
66
Loop Example Cycle 15
§ Mult2 completing. Who is waiting?
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8
0 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
15 64 Fu Load3 Mult2
![Page 67: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/67.jpg)
67
Loop Example Cycle 16Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
16 64 Fu Load3 Mult1
![Page 68: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/68.jpg)
68
Loop Example Cycle 17Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
17 64 Fu Load3 Mult1
![Page 69: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/69.jpg)
69
Loop Example Cycle 18Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
18 64 Fu Load3 Mult1
![Page 70: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/70.jpg)
70
Loop Example Cycle 19Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 19 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
19 64 Fu Load3 Mult1
![Page 71: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/71.jpg)
71
Loop Example Cycle 20Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 No2 SD F4 0 R1 8 19 20 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
20 64 Fu Load3 Mult1
![Page 72: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/72.jpg)
72
Why can Tomasulo overlap iterations of loops?
§ Register renaming– Multiple iterations use different physical destinations for
registers (dynamic loop unrolling).
§ Reservation stations – Permit instruction issue to advance past integer control flow
operations
§ Other idea: Tomasulo building dynamic “DataFlow” graph from instructions– Fits in with readings for Wednesday
![Page 73: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/73.jpg)
73
Summary§ Scoreboard: Track dependencies through reservations
– Simple scheme for out-of-order execution– WAW and WAR hazards force stalls – cannot handle multiple
instructions with same destination register§ Reservations stations: renaming to larger set of registers +
buffering source operands– Prevents registers as bottleneck– Avoids WAR, WAW hazards of Scoreboard– Allows loop unrolling in HW
§ Dynamic hardware schemes can unroll loops dynamically in hardware– Form of limited dataflow– Register renaming is essential
§ Lasting Contributions of Tomasulo Algorithm– Dynamic scheduling– Register renaming– Load/store disambiguation
§ 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264
![Page 74: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/74.jpg)
74
SCOREBOARD
![Page 75: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/75.jpg)
75
Scoreboard: a bookkeeping technique§ Out-of-order execution divides ID stage:
1.Issue—decode instructions, check for structural hazards2.Read operands—wait until no data hazards, then read
operands§ Scoreboards date to CDC6600 in 1963
– Readings for Monday include one on CDC6600§ Instructions execute whenever not dependent on
previous instructions and no hazards. § CDC 6600: In order issue, out-of-order execution, out-
of-order commit (or completion)– No forwarding!– Imprecise interrupt/exception model for now
![Page 76: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/76.jpg)
76
Scoreboard Architecture (CDC 6600)
Func
tion
al U
nits
Regist
ers
FP MultFP Mult
FP Divide
FP Add
Integer
MemorySCOREBOARD
![Page 77: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/77.jpg)
77
Scoreboard Implications§ Out-of-order completion => WAR, WAW hazards?§ Solutions for WAR:
– Stall writeback until registers have been read– Read registers only during Read Operands stage
§ Solution for WAW:– Detect hazard and stall issue of new instruction until other
instruction completes§ No register renaming § Need to have multiple instructions in execution phase
=> multiple execution units or pipelined execution units
§ Scoreboard keeps track of dependencies between instructions that have already issued
§ Scoreboard replaces ID, EX, WB with 4 stages
![Page 78: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/78.jpg)
78
Four Stages of Scoreboard Control§ Issue—decode instructions & check for structural
hazards (ID1)– Instructions issued in program order (for hazard checking)– Don’t issue if structural hazard– Don’t issue if instruction is output dependent on any
previously issued but uncompleted instruction (no WAW hazards)
§ Read operands—wait until no data hazards, then read operands (ID2)– All real dependencies (RAW hazards) resolved in this stage,
since we wait for instructions to write back data.– No forwarding of data in this model!
![Page 79: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/79.jpg)
79
Four Stages of Scoreboard Control§ Execution—operate on operands (EX)
– The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution.
§ Write result—finish execution (WB)– Stall until no WAR hazards with previous instructions:
Example: DIVD F0,F2,F4ADDD F10,F0,F8SUBD F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads operands
![Page 80: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/80.jpg)
80
Three Parts of the Scoreboard§ Instruction status:
Which of 4 steps the instruction is in
§ Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each functional unit
Busy: Indicates whether the unit is busy or notOp: Operation to perform in the unit (e.g., + or –)Fi: Destination registerFj,Fk: Source-register numbersQj,Qk: Functional units producing source registers Fj, FkRj,Rk: Flags indicating when Fj, Fk are ready
§ Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register
![Page 81: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/81.jpg)
81
Scoreboard Example
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU
![Page 82: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/82.jpg)
82
Detailed Scoreboard Pipeline Control
Read operandsExecution complete
Instruction status
Write result
Issue
Bookkeeping
Rj¬ No; Rk¬ No
"f(if Qj(f)=FU then Rj(f)¬ Yes);"f(if Qk(f)=FU then Rj(f)¬ Yes); Result(Fi(FU))¬ 0; Busy(FU)¬ No
Busy(FU)¬ yes; Op(FU)¬ op; Fi(FU)¬ `D’; Fj(FU)¬ `S1’;
Fk(FU)¬ `S2’; Qj¬ Result(‘S1’); Qk¬ Result(`S2’); Rj¬ not Qj;
Rk¬ not Qk; Result(‘D’)¬ FU;
Rj and Rk
Functional unit done
Wait until
"f((Fj(f)¹Fi(FU) or Rj(f)=No) &
(Fk(f)¹Fi(FU) or Rk( f )=No))
Not busy (FU) and not result(D)
![Page 83: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/83.jpg)
83
Scoreboard Example: Cycle 1
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer
![Page 84: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/84.jpg)
84
Scoreboard Example: Cycle 2
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer
• Issue 2nd LD?
![Page 85: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/85.jpg)
85
Scoreboard Example: Cycle 3
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer
• Issue MULT?
![Page 86: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/86.jpg)
86
Scoreboard Example: Cycle 4
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Integer
![Page 87: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/87.jpg)
87
Scoreboard Example: Cycle 5
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer
![Page 88: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/88.jpg)
88
Scoreboard Example: Cycle 6
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTD F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 Integer
![Page 89: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/89.jpg)
89
Scoreboard Example: Cycle 7
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 Integer Add
• Read multiply operands?
![Page 90: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/90.jpg)
90
Scoreboard Example: Cycle 8a(First half of clock cycle)
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Integer Add Divide
![Page 91: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/91.jpg)
91
Scoreboard Example: Cycle 8b(Second half of clock cycle)
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide
![Page 92: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/92.jpg)
92
Scoreboard Example: Cycle 9
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No10 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No
2 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide
• Read operands for MULT & SUB? Issue ADDD?
Note Remaining
![Page 93: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/93.jpg)
93
Scoreboard Example: Cycle 10
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No9 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
1 Add Yes Sub F8 F6 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 Add Divide
![Page 94: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/94.jpg)
94
Scoreboard Example: Cycle 11
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No8 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
0 Add Yes Sub F8 F6 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 Add Divide
![Page 95: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/95.jpg)
95
Scoreboard Example: Cycle 12
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No7 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 Divide
• Read operands for DIVD?
![Page 96: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/96.jpg)
96
Scoreboard Example: Cycle 13
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No6 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 Add Divide
![Page 97: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/97.jpg)
97
Scoreboard Example: Cycle 14
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No5 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
2 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 Add Divide
![Page 98: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/98.jpg)
98
Scoreboard Example: Cycle 15
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No4 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
1 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 Add Divide
![Page 99: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/99.jpg)
99
Scoreboard Example: Cycle 16
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No3 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
0 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU Mult1 Add Divide
![Page 100: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/100.jpg)
100
Scoreboard Example: Cycle 17
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No2 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3017 FU Mult1 Add Divide
• Why not write result of ADD???
WAR Hazard!
![Page 101: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/101.jpg)
101
Scoreboard Example: Cycle 18
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No1 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3018 FU Mult1 Add Divide
![Page 102: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/102.jpg)
102
Scoreboard Example: Cycle 19
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No0 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3019 FU Mult1 Add Divide
![Page 103: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/103.jpg)
103
Scoreboard Example: Cycle 20
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3020 FU Add Divide
![Page 104: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/104.jpg)
104
Scoreboard Example: Cycle 21
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3021 FU Add Divide
• WAR Hazard is now gone...
![Page 105: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/105.jpg)
105
Scoreboard Example: Cycle 22
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd No
39 Divide Yes Div F10 F0 F6 No No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3022 FU Divide
![Page 106: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/106.jpg)
106
Faster than light computation(skip a couple of cycles)
![Page 107: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/107.jpg)
107
Scoreboard Example: Cycle 61
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd No
0 Divide Yes Div F10 F0 F6 No No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3061 FU Divide
![Page 108: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/108.jpg)
108
Scoreboard Example: Cycle 62
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU
![Page 109: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/109.jpg)
109
Review: Scoreboard Example: Cycle 62
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU
• In-order issue; out-of-order execute & commit
![Page 110: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/110.jpg)
110
CDC 6600 Scoreboard§ Speedup 1.7 from compiler; 2.5 by hand
BUT slow memory (no cache) limits benefit§ Limitations of 6600 scoreboard:
– No forwarding hardware– Limited to instructions in basic block (small window)– Small number of functional units (structural hazards),
especially integer/load store units– Do not issue on structural hazards– Wait for WAR hazards– Prevent WAW hazards
![Page 111: --Dynamic Scheduling (OOO) via Tomasulo’s Approach · §Tomasulo’s Approach ... Tomasulo Example Cycle 3 Instruction status: Exec Write Instruction j k Issue Comp Result Busy](https://reader030.vdocument.in/reader030/viewer/2022040915/5e8d05e60989714e041cdfcd/html5/thumbnails/111.jpg)
111
A Useful Slide
§ Interesting Resource: http://bitsavers.org– Has digital versions of users manuals for old machines– Quite interesting!– I’ll link in some of them to your reading pages when it is
appropriate– Very limited bandwidth: use mirrors such as:
http://bitsavers.vt100.net