sam: optimizing multithreaded cores for speculative ... · sam : optimizing multithreaded cores for...
TRANSCRIPT
![Page 1: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/1.jpg)
SAM:OptimizingMultithreadedCoresforSpeculativeParallelismMALEENABEYDEERA, SUVINAY SUBRAMANIAN, MARKJEFFREY,JOEL EMER, DANIEL SANCHEZ
PACT 2017
![Page 2: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/2.jpg)
ExecutiveSummaryAnalyzestheinterplaybetweenhardwaremultithreadingandspeculativeparallelism
(eg:ThreadLevelSpeculationandTransactionalMemory )
Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources
Why?Allthreadsaretreatedequally
SpeculationAwareMultithreading(SAM)• Prioritizethreadsrunning tasksmorelikelytocommit
SAMmakesmultithreadingmoreuseful
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 2
![Page 3: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/3.jpg)
ExecutiveSummaryAnalyzestheinterplaybetweenhardwaremultithreadingandspeculativeparallelism
(eg:ThreadLevelSpeculationandTransactionalMemory )
Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources
Why?Allthreadsaretreatedequally
SpeculationAwareMultithreading(SAM)• Prioritizethreadsrunning tasksmorelikelytocommit
SAMmakesmultithreadingmoreuseful
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 2
![Page 4: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/4.jpg)
Outline
BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 3
![Page 5: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/5.jpg)
BackgroundonSpeculativeParallelism
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4
ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy
![Page 6: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/6.jpg)
BackgroundonSpeculativeParallelism
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4
ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy
SpeculativeParallelism
Orderede.g.Thread-LevelSpeculation(TLS)
(Programorderdictatestheconflictresolutionorder)
Unorderede.g.HardwareTransactionalMemory
(Anyexecutionorderisvalid,buthigh-performanceconflictresolutionpoliciesdefineanorder)
![Page 7: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/7.jpg)
BackgroundonSpeculativeParallelism
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4
ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy
Implicitorderamongalltasksinanyspeculativesystem
SpeculativeParallelism
Orderede.g.Thread-LevelSpeculation(TLS)
(Programorderdictatestheconflictresolutionorder)
Unorderede.g.HardwareTransactionalMemory
(Anyexecutionorderisvalid,buthigh-performanceconflictresolutionpoliciesdefineanorder)
![Page 8: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/8.jpg)
BaselineSystem- Swarm[Jeffrey,MICRO’15]
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5
void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {
for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);
}}
}
![Page 9: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/9.jpg)
BaselineSystem- Swarm[Jeffrey,MICRO’15]
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5
void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {
for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);
}}
} Taskscreatechildrentasks(functionptr,timestamp,args)
Timestampedtasks
![Page 10: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/10.jpg)
BaselineSystem- Swarm[Jeffrey,MICRO’15]
Tasksappeartoexecuteintimestamporder
Unorderedexecutionviaequaltimestamps
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5
void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {
for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);
}}
} Taskscreatechildrentasks(functionptr,timestamp,args)
Timestampedtasks
![Page 11: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/11.jpg)
SwarmMicroarchitecture
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6
Equaltimestamps:globalorderviaVirtualTime(VT)
Timestamp Tiebreaker
Virtual Time
![Page 12: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/12.jpg)
SwarmMicroarchitecture
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6
Mem / IO
Mem
/ IO
Mem / IO
Mem
/ IO
16-tile, 64-core CMP Tile Organization
Core Core Core Core
L1I/D L1I/D L1I/D L1I/D
L2
L3 SliceRouter
Task Unit
Tile
Equaltimestamps:globalorderviaVirtualTime(VT)
Timestamp Tiebreaker
Virtual Time
![Page 13: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/13.jpg)
SwarmMicroarchitecture
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6
Mem / IO
Mem
/ IO
Mem / IO
Mem
/ IO
16-tile, 64-core CMP Tile Organization
Core Core Core Core
L1I/D L1I/D L1I/D L1I/D
L2
L3 SliceRouter
Task Unit
Tile
Equaltimestamps:globalorderviaVirtualTime(VT)
Tasksexecuteout-of-order,butcommitinVTorder
Timestamp Tiebreaker
Virtual Time
Commitqueue:stateoftaskswaitingtocommit
![Page 14: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/14.jpg)
Outline
BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 7
![Page 15: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/15.jpg)
PitfallsofSpeculation-ObliviousMultithreading
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8
Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order
![Page 16: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/16.jpg)
PitfallsofSpeculation-ObliviousMultithreading
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8
Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order
![Page 17: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/17.jpg)
PitfallsofSpeculation-ObliviousMultithreading
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8
Insights:1.Multithreadingcanbehighlybeneficial
Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order
Micro-opsissuedfromcommitted tasks
Noreadymicro-opstoissue
![Page 18: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/18.jpg)
PitfallsofSpeculation-ObliviousMultithreading
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8
Insights:1.Multithreadingcanbehighlybeneficial
However,multithreadingcanalsoleadto:2.Increasedaborts
Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order
Micro-opsissuedfromcommitted tasks
Noreadymicro-opstoissue
Micro-opsissuedfromabortedtasks
![Page 19: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/19.jpg)
PitfallsofSpeculation-ObliviousMultithreading
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8
Insights:1.Multithreadingcanbehighlybeneficial
However,multithreadingcanalsoleadto:2.Increasedaborts3.Inefficientuseofspeculationresources
Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order
Micro-opsissuedfromcommitted tasks
Noreadymicro-opstoissue
Micro-opsissuedfromabortedtasks
Resourcestalls
![Page 20: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/20.jpg)
PitfallsofSpeculation-ObliviousMultithreading
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8
Insights:1.Multithreadingcanbehighlybeneficial
However,multithreadingcanalsoleadto:2.Increasedaborts3.Inefficientuseofspeculationresources
Unlikely-to-committaskshurtthethroughputoflikely-to-commitones
Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order
Micro-opsissuedfromcommitted tasks
Noreadymicro-opstoissue
Micro-opsissuedfromabortedtasks
Resourcestalls
![Page 21: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/21.jpg)
Speculation-AwareMultithreading
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 9
Prioritizethreadsaccordingtotheirconflictresolutionpriorities
ReduceSpeculationResourceStalls(taskscommitearly)
ReduceAborts(focusresourcesontaskslikelytocommit)
![Page 22: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/22.jpg)
Outline
BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 10
![Page 23: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/23.jpg)
SAMonin-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11
SMTIssue
Fetch Decode leslesRegisterFiles
Pipe 0Pipe 1
Int ALUFP ALU
Int ALUMem/DCache
Thread micro-op queues
![Page 24: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/24.jpg)
SAMonin-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11
SMTIssue
Fetch Decode leslesRegisterFiles
Pipe 0Pipe 1
Int ALUFP ALU
Int ALUMem/DCache
Thread micro-op queues
![Page 25: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/25.jpg)
SAMonin-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11
SMTIssue
Fetch Decode leslesRegisterFiles
Pipe 0Pipe 1
Int ALUFP ALU
Int ALUMem/DCache
Thread micro-op queues
Conflict resolutionpriority updates(Virtual Times)
Task Unit
![Page 26: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/26.jpg)
SAMonin-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11
SMTIssue
Fetch Decode leslesRegisterFiles
Pipe 0Pipe 1
Int ALUFP ALU
Int ALUMem/DCache
Thread micro-op queues
SAM issue priorities(higher is better)
SortMax
Ready
52:9
52:717:195:4
Virtual Times
3
24
1
IssueThreadID
Conflict resolutionpriority updates(Virtual Times)
Task Unit
![Page 27: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/27.jpg)
ExperimentalMethodology
BaselineSystem• Swarm+Wait-N-GoTM [Jafrietal.ASPLOS’13]conflictresolutiontechniques• Cycle-accurate,event-driven,Pin-basedsimulator• Modelsystemsupto64cores• Cores:2wideissue,upto8threadspercore
Benchmarks• Ordered:Swarm[Jeffreyetal.MICRO’15,MICRO’16]– 8benchmarks• Unordered:STAMP[Minhetal. IISWC’08]– 8benchmarks
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 12
![Page 28: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/28.jpg)
SAMmakesmultithreadingmoreeffective
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13
1 Thread
Ordered Benchmarks Unordered Benchmarks
![Page 29: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/29.jpg)
SAMmakesmultithreadingmoreeffective
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13
8 Thread Round Robin1 Thread
Ordered Benchmarks Unordered Benchmarks
![Page 30: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/30.jpg)
SAMmakesmultithreadingmoreeffective
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13
8 Thread SAM8 Thread Round Robin1 Thread
Ordered Benchmarks Unordered Benchmarks
![Page 31: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/31.jpg)
SAMmakesmultithreadingmoreeffective
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13
8 Thread SAM8 Thread Round Robin1 Thread
Ordered Benchmarks Unordered Benchmarks
8threadedcoresoutperformsinglethreadedcoresby1.85X
WithSAM,thebenefitincreasesto2.33X
![Page 32: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/32.jpg)
SAMmakesmultithreadingmoreeffective
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13
8 Thread SAM8 Thread Round Robin1 Thread
Ordered Benchmarks Unordered Benchmarks
8threadedcoresoutperformsinglethreadedcoresby1.85X
WithSAM,thebenefitincreasesto2.33X
![Page 33: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/33.jpg)
SAMmakesmultithreadingmoreeffective
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13
8 Thread SAM8 Thread Round Robin1 Thread
Ordered Benchmarks Unordered Benchmarks
8threadedcoresoutperformsinglethreadedcoresby1.85X
WithSAM,thebenefitincreasesto2.33X
![Page 34: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/34.jpg)
WhydoesSAMhelp?
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14
SAMmatchesRRwhentherearenopathologies
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other
![Page 35: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/35.jpg)
WhydoesSAMhelp?
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14
SAMmatchesRRwhentherearenopathologies
SAMreduceswastedwork
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other
![Page 36: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/36.jpg)
WhydoesSAMhelp?
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14
SAMmatchesRRwhentherearenopathologies
SAMreduceswastedwork
SAMreducesresourcestalls
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other
![Page 37: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/37.jpg)
Outline
BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 15
![Page 38: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/38.jpg)
SAMonout-of-ordercoresUnlikein-ordercores,prioritiesaffectpipelineefficiency• Asinglethreadcanclogcoreresources• Increasedwrongpathexecution
Despitethese,prioritizingtasksisbetter
Needforaggressiveprioritizationaffectscoredesign• Shared,notpartitionedROBs
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 16
SMTIssue
Fetch Decode
Thread micro-op queues
Issue Buffer
PhysicalRegFile
Pipe 0 ReorderBuffer
Pipe 1
In-flight uops (for ICount)
3 9 4 2
SAM priorities
3 4 2 1
Conflict resolutionpriority updates(from task unit)
Conflict res. priorities
2 3 2 1
![Page 39: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/39.jpg)
SAMtradeoffs without-of-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
Baselinepolicy- ICount(IC)
sssp – 8threads
![Page 40: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/40.jpg)
SAMtradeoffs without-of-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
Baselinepolicy- ICount(IC)
SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls
sssp – 8threads
![Page 41: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/41.jpg)
SAMtradeoffs without-of-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
Baselinepolicy- ICount(IC)
SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls
Butreducedpipelineefficiency
sssp – 8threads
![Page 42: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/42.jpg)
SAMtradeoffs without-of-ordercores
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
Baselinepolicy- ICount(IC)
SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls
ButreducedpipelineefficiencyIncreaseinwrong-pathissues+not-readystalls
sssp – 8threads
![Page 43: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/43.jpg)
AdaptiveSAMpolicy
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 44: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/44.jpg)
AdaptiveSAMpolicy
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18
HardwarecounterstotrackcyclesMicro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 45: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/45.jpg)
AdaptiveSAMpolicy
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18
Aborted Resource NotreadyWrongpath
HardwarecounterstotrackcyclesMicro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 46: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/46.jpg)
AdaptiveSAMpolicy
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18
Aborted Resource NotreadyWrongpath
Hardwarecounterstotrackcycles
+
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 47: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/47.jpg)
AdaptiveSAMpolicy
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18
Aborted Resource NotreadyWrongpath
Hardwarecounterstotrackcycles
Cycleslosttotasklevelspeculation
+
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 48: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/48.jpg)
AdaptiveSAMpolicy
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18
Aborted Resource NotreadyWrongpath
Hardwarecounterstotrackcycles
Cycleslosttotasklevelspeculation
Cycleslosttopipelineinefficiencies
+ +
>
UseSAM UseICount
True False
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 49: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/49.jpg)
SAMonOoO cores(allbenchmarks)
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19
At8threads/core:• Multithreadingimprovesperformance
oversinglethreadedcoresby1.1x
Averageoverallbenchmarks
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 50: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/50.jpg)
SAMonOoO cores(allbenchmarks)
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19
At8threads/core:• Multithreadingimprovesperformance
oversinglethreadedcoresby1.1x• WithSAM,improvementrisesto1.5x
Averageoverallbenchmarks
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 51: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/51.jpg)
SAMonOoO cores(allbenchmarks)
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19
At8threads/core:• Multithreadingimprovesperformance
oversinglethreadedcoresby1.1x• WithSAM,improvementrisesto1.5x
Adaptivepolicyslightlyincreasesperformanceat2and4threads
Averageoverallbenchmarks
Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath
![Page 52: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/52.jpg)
Conclusion
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 20
Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources
SpeculationAwareMultithreading(SAM)Prioritizethreadsrunningtasksmorelikelytocommit
SAMmakesmultithreadingmoreuseful
![Page 53: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware](https://reader034.vdocument.in/reader034/viewer/2022042712/5f966282c317fa238d35b311/html5/thumbnails/53.jpg)
Questions?
SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 21
Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources
SpeculationAwareMultithreading(SAM)Prioritizethreadsrunningtasksmorelikelytocommit
SAMmakesmultithreadingmoreuseful