simultaneous multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-smt.pdf · simultaneous...
TRANSCRIPT
![Page 1: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/1.jpg)
Simultaneous Multithreading – p. 1
Simultaneous Multithreading
Esen VAROL
YEDİTEPE UNIVERSITY
![Page 2: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/2.jpg)
Simultaneous Multithreading – p. 2
Contents
Advances in TechnologyTypes of ParallelismSimultaneous Multithreading – the ideaComparison of Parallel ProcessorsSimultaneous Multithreading ModelResultsSimultaneous Multithreading IssuesCommercial AspectsReferencesConclusion
![Page 3: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/3.jpg)
Simultaneous Multithreading – p. 3
A Billion Transistors, Possibilities?
Add more memoryIncrease on-chip cache/primary memory
Increase system integrationAdd I/O controllers, graphics accelerators
Enhance computational capabilityIncrease parallelism in all forms
![Page 4: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/4.jpg)
Simultaneous Multithreading – p. 4
Types of Parallelism
Instruction level parallelismPipeliningSuperscalarVery Long Instruction Word
Application level parallelismParallel programmingMultiple threadsMultiple processes
![Page 5: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/5.jpg)
Simultaneous Multithreading – p. 5
Superscalar
Issue multiple instructions in each cycleMultiple issues are not due to pipeliningSeveral functional units of the same type, e.g. ALUsDispatcher reads instructions, decides which can run inparallelIn VLIW, dispatcher complexity moved to compiler
![Page 6: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/6.jpg)
Simultaneous Multithreading – p. 6
Multithreaded Processors
Multiple threads share functional unitsIndependent hardware state of each thread duplicatedTypes of multithreading:
Fine grainedSwitch between threads on each cycle
Coarse grainedSwitch between threads only on costly stalls
![Page 7: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/7.jpg)
Simultaneous Multithreading – p. 7
Simultaneous Multithreading – the Idea
Combine superscalar and multithreadingFrom superscalar
Issue multiple instructions per cycleFrom multithreading
Hardware state for several programs/threadsResult
Issue multiple instructions from multiple threads ineach cycle
![Page 8: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/8.jpg)
Simultaneous Multithreading – p. 8
Comparison
![Page 9: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/9.jpg)
Simultaneous Multithreading – p. 9
SMT Model
Minimal extension of an out-of-order superscalarResources replicated
State for hardware contexts (registers, PCs)Per thread mechanisms for
Pipeline flushingSubroutine returns
Also, per thread identifiers forBranch target bufferTranslation lookaside buffer
![Page 10: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/10.jpg)
Simultaneous Multithreading – p. 10
SMT Model (continued ..)
Resources redesignedInstruction fetch unitProcessor pipeline
Instruction SchedulingDoes not require additional hardwareRegister renaming (same as superscalar)
![Page 11: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/11.jpg)
Simultaneous Multithreading – p. 11
Block Diagram
![Page 12: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/12.jpg)
Simultaneous Multithreading – p. 12
Instruction Fetch Unit
Takes advantage of inter-thread competitionPartitioning bandwidthFetching threads that give maximum local benefit
2.8 fetchingFetch 1 inst. per logical processor, for 2 threadsDecode 1 thread till branch/end of cache line, then
jump to the otherICount feedback
Highest priority to threads with fewest instructions inthe decode, renaming, and queue pipeline stages
Small hardware addition to track queue lengths
![Page 13: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/13.jpg)
Simultaneous Multithreading – p. 13
Register File and Pipeline
Each thread has 32 architectural registersRegister file: 32 * #threads, plus rename registersLarger register file, longer access timeTo avoid increase in clock cycle time, SMT pipelineextended to allow 2 cycle register reads and writes2 cycle reads/writes increase branch mispredictionpenalty
![Page 14: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/14.jpg)
Simultaneous Multithreading – p. 14
Results
ILP and TLP exploited simultaneouslySMT vs. Superscalar
Superscalar unable to exploit TLPSMT vs. Fine-grained multithreading
F.G. eliminated only vertical wasteSMT vs. Multiprocessors
Multiprocessors limited by static resource partitioningHurrah! SMTs performed the best ..
![Page 15: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/15.jpg)
Simultaneous Multithreading – p. 15
SMT Issues – what to fetch
StaticRound-robin8 instructions from one thread or4 instructions from two threads or2 instructions from four threads etc.
DynamicFavour threads with minimal in-flight branchesFavour threads with minimal outstanding missesFavour threads with minimal in-flight instructions
![Page 16: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/16.jpg)
Simultaneous Multithreading – p. 16
SMT Issues – what to issue
Oldest firstCache hit speculated lastBranch speculated firstBranches first
Important result: Unlike superscalar, doesn’t matter much!
![Page 17: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/17.jpg)
Simultaneous Multithreading – p. 17
SMT Issues – Caching
Same cache shared among threadsNo coherence issuesBut, cache conflicts increasePossibility of cache thrashing
![Page 18: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/18.jpg)
Simultaneous Multithreading – p. 18
SMT Issues – Synchronization
Spinlocks not useful (in fact, bad!)Synchronization mechanism needs to be fast, light,scalableSuggested Method (memory based)
acquire(lock):blocks on failureonly completes execution on success
release(lock):writes zero if no other thread blockingelse unblocks the other thread
![Page 19: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/19.jpg)
Simultaneous Multithreading – p. 19
SMT Issues – Compiler optimizations
Compiler should try to minimize cache interference bymultiple threads in the same programLatency hiding techniques like speculation fromuniprocessor environments need to be rethoughtSharing optimization techniques from multiprocessorschange, since data sharing is now good
![Page 20: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/20.jpg)
Simultaneous Multithreading – p. 20
Applications
Biggest application: servers!E.g., server running ApacheUsed by Sun, IBM in high-end servers
![Page 21: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/21.jpg)
Simultaneous Multithreading – p. 21
Commercial SMTs
Compaq Alpha 21464Planned 4T processorAxed in 2001
Pentium IV Xeon2T processorHyperthreading = Intel buzzword for SMT
Sun Ultrasparc IV2T processorAlso a CMP (chip multicore processor)
![Page 22: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/22.jpg)
Simultaneous Multithreading – p. 22
Conclusion
Simple design extension to existing processortechnologyExploits ILP and TLP without sacrificing single threadperformanceOptimized compiler and operating system support wouldimprove performance
Incidentally, Intel has announced plans for a multi-core SMTprocessor.
![Page 23: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/23.jpg)
Simultaneous Multithreading – p. 23
References
Simultaneous Multithreading: A Platform forNext-generation Processors
Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., &Tullsen, D.Tuning Compiler Optimizations for SimultaneousMultithreading
Lo, J., Eggers, S., Levy, H., Parekh, S., & Tullsen, D.Supporting Fine-Grain Synchronization on aSimultaneous Multithreaded Processor
Tullsen, D., Lo, J., Eggers, S., & Levy, H.http://lapwww.epfl.ch/courses/advcomparch/
Prof. Paolo Ienne
![Page 24: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread](https://reader035.vdocument.in/reader035/viewer/2022062507/5fcb777f38f3da199b635de3/html5/thumbnails/24.jpg)
Simultaneous Multithreading – p. 24
Questions?
Thank You!