out-of-order execution structures optimizations
DESCRIPTION
Out-of-Order Execution Structures Optimizations. Tag Elimination. Conventional Schedulers are Overdesigned. For MIPS-like ISA Two source tags One destination tag Not all instructions use two source operands Eg, addi $1, $2, 10 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/1.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Out-of-Order Execution StructuresOptimizations
![Page 2: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/2.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Tag Elimination
![Page 3: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/3.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Conventional Schedulers are Overdesigned• For MIPS-like ISA
– Two source tags – One destination tag
• Not all instructions use two source operands– Eg, addi $1, $2, 10
• Not all instructions produce a result that is interesting for scheduling– E.g., beq
• Some operands are ready when the instruction enters the scheduler
• Source: Efficient Dynamic Scheduling Through Tag Elimination, Dan Ernst and Todd Austin, ISCA 2002
![Page 4: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/4.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Some Operands are Ready when the Instruction Enters the Scheduler
![Page 5: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/5.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Window Specialization• Have reservation stations with different
source operand wait capabilities
![Page 6: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/6.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Window Specialization• At rename check how many source operands
are not ready• If there is an appropriate slot proceed to
schedule• If not, stall at rename
• Advantages:– Destination bus only runs over reservation
stations with comparators– Load on the destination bus is reduced
• Disadvantages:– Stalls due to unavailability of reservation stations– Complexity of res. Station assignment
![Page 7: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/7.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Window Specialization - Performance
Performance as IPC – Actual Clock Frequency not considered
![Page 8: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/8.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Window Specialization - Performance
Performance as IPC per ns
![Page 9: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/9.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Last Tag Prediction• Observe:
– Instruction becomes ready after the last tag it waits for appears
• Last Tag prediction– Predict which of the two tags will that be
• Speculatively execute – Correct speculation: that was the last tag– Incorrect speculation:
• Need to reschedule• Detection? Try to read a value that is not
available
![Page 10: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/10.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
GShare-Style Last Tag Prediction
Two-bit saturating counters
![Page 11: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/11.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Accuracy
• Over all instructions with two outstanding operands
![Page 12: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/12.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Window Specialization - Performance
Performance as IPC – Actual Clock Frequency not considered
![Page 13: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/13.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Window Specialization - Performance
Performance as IPC per ns
![Page 14: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/14.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Prescheduling
Data-flow prescheduling for largeinstruction windows in out-of-order
processorsPierre Michaud, André Seznec,
HPCA 2001
![Page 15: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/15.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Prescheduling
• Predict latencies• Put scheduled instructions into a FIFO• Slide into a smaller window
![Page 16: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/16.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Prescheduling Method
![Page 17: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/17.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Prescheduling Example
![Page 18: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/18.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Latency Prediction
![Page 19: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/19.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Latency Prediction Contd.
![Page 20: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/20.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Broadcast Free Scheduler
![Page 21: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/21.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Broadcast Free Scheduler• Cyclone design
– D. Ernst, A. Hamel, T. Austin– ISCA 2003
• Preschedule Instructions• Put them into a dual strip cyclical FIFO • Vertical paths allow for motion between
the strips
![Page 22: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/22.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone ArchitectureWill be ready in cycle + 6
![Page 23: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/23.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone Architecture – Cycle +1
![Page 24: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/24.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone Architecture – Cycle + 2
![Page 25: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/25.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone Architecture – Cycle + 3
![Page 26: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/26.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone Architecture – Cycle + 4
![Page 27: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/27.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone Architecture – Cycle + 5
![Page 28: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/28.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone Architecture – Cycle + 6
![Page 29: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/29.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone Architecture – Cycle + 6
![Page 30: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/30.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone Architecture – Mis-scheduling
Estimate new latency
![Page 31: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/31.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Pre-scheduler
Insert instruction with predicted latency N at the front of the FIFOHave it switch at N/2
Can only do two cascaded MAX calculationsDue to timing considerations
![Page 32: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/32.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone IPC Performance
![Page 33: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/33.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Cyclone True Performance and Area
![Page 34: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/34.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Matrix Schedulers
![Page 35: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/35.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Conventional Scheduler
WS requests
IW grants
![Page 36: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/36.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Conventional Scheduler Timing
A2
A2 B1
B1
B3
B3
Source: A High-Speed Dynamic Instruction Scheduling Schemefor Superscalar ProcessorsMasahiro Goshima Kengo Nishino Yasuhiko Nakashima Shin-ichiro MoriToshiaki Kitamura Shinji TomitaMICRO 2001
Can’t pipeline without introducingBubbles between dependent Instructions:
![Page 37: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/37.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Towards a Matrix Scheduler• Observe:
– In conventional scheduling dependences are discovered twice:
• Once at renaming• Once during scheduling
– Why? Dependences are implicitly represented
• Producer and Consumer link via a name• This is indirect
• Matrix Scheduler idea:– Represent dependences explicitly
![Page 38: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/38.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Dependence MatrixW
ho a
m I
Who do I depend upon?
Left source Right source
![Page 39: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/39.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Matrix Scheduler
wakeup
Write port
![Page 40: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/40.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Inserting an entry
wakeup
Write port
![Page 41: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/41.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Wakeup
wakeup
![Page 42: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/42.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Mispeculation Recovery• Do not cleanup• Use external logic to inhibit request
signals
![Page 43: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/43.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Delay
Partial wakeup lines0.18um1.8V85C
![Page 44: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/44.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Delay measurement points
![Page 45: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/45.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Scheduling Priorities
![Page 46: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/46.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Conflict Resolution• More instructions ready than available issue slots
– Which get to go?• Age vs. Pseudo-Random Resolution
• Age is important• Priority Enforcer picks the oldest
– Complex
Source:Matrix Scheduler ReloadedISCA 2007
![Page 47: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/47.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Compacting Scheduler• Implemented in the Alpha 21264• Physical order within scheduler
corresponds to age• Entry freed:
– Shift up all younger entries
![Page 48: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/48.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Virtual Physical Registers• Physical register names are used for two
purposes– Scheduling– Communicating
• A physical register is held much in advance than needed– We need the register only after the value is
produced• De-couple scheduling from
communication names
![Page 49: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/49.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Used vs. Allocated Registers
![Page 50: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/50.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Goal
![Page 51: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/51.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Virtual Physical Registers
![Page 52: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/52.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Deadlock• Older instruction completes later than
younger ones– No registers available
• Steal a register and re-execute
![Page 53: Out-of-Order Execution Structures Optimizations](https://reader036.vdocument.in/reader036/viewer/2022062810/56815dfa550346895dcc3581/html5/thumbnails/53.jpg)
A. Moshovos © ECE1773 - Fall ‘07 ECE Toronto
Performance vs. Physical Registers