simultaneous multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-smt.pdf · simultaneous...

24
Simultaneous Multithreading – p. 1 Simultaneous Multithreading Esen VAROL [email protected] YEDİTEPE UNIVERSITY

Upload: others

Post on 21-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 1

Simultaneous Multithreading

Esen VAROL

[email protected]

YEDİTEPE UNIVERSITY

Page 2: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 2

Contents

Advances in TechnologyTypes of ParallelismSimultaneous Multithreading – the ideaComparison of Parallel ProcessorsSimultaneous Multithreading ModelResultsSimultaneous Multithreading IssuesCommercial AspectsReferencesConclusion

Page 3: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 3

A Billion Transistors, Possibilities?

Add more memoryIncrease on-chip cache/primary memory

Increase system integrationAdd I/O controllers, graphics accelerators

Enhance computational capabilityIncrease parallelism in all forms

Page 4: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 4

Types of Parallelism

Instruction level parallelismPipeliningSuperscalarVery Long Instruction Word

Application level parallelismParallel programmingMultiple threadsMultiple processes

Page 5: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 5

Superscalar

Issue multiple instructions in each cycleMultiple issues are not due to pipeliningSeveral functional units of the same type, e.g. ALUsDispatcher reads instructions, decides which can run inparallelIn VLIW, dispatcher complexity moved to compiler

Page 6: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 6

Multithreaded Processors

Multiple threads share functional unitsIndependent hardware state of each thread duplicatedTypes of multithreading:

Fine grainedSwitch between threads on each cycle

Coarse grainedSwitch between threads only on costly stalls

Page 7: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 7

Simultaneous Multithreading – the Idea

Combine superscalar and multithreadingFrom superscalar

Issue multiple instructions per cycleFrom multithreading

Hardware state for several programs/threadsResult

Issue multiple instructions from multiple threads ineach cycle

Page 8: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 8

Comparison

Page 9: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 9

SMT Model

Minimal extension of an out-of-order superscalarResources replicated

State for hardware contexts (registers, PCs)Per thread mechanisms for

Pipeline flushingSubroutine returns

Also, per thread identifiers forBranch target bufferTranslation lookaside buffer

Page 10: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 10

SMT Model (continued ..)

Resources redesignedInstruction fetch unitProcessor pipeline

Instruction SchedulingDoes not require additional hardwareRegister renaming (same as superscalar)

Page 11: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 11

Block Diagram

Page 12: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 12

Instruction Fetch Unit

Takes advantage of inter-thread competitionPartitioning bandwidthFetching threads that give maximum local benefit

2.8 fetchingFetch 1 inst. per logical processor, for 2 threadsDecode 1 thread till branch/end of cache line, then

jump to the otherICount feedback

Highest priority to threads with fewest instructions inthe decode, renaming, and queue pipeline stages

Small hardware addition to track queue lengths

Page 13: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 13

Register File and Pipeline

Each thread has 32 architectural registersRegister file: 32 * #threads, plus rename registersLarger register file, longer access timeTo avoid increase in clock cycle time, SMT pipelineextended to allow 2 cycle register reads and writes2 cycle reads/writes increase branch mispredictionpenalty

Page 14: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 14

Results

ILP and TLP exploited simultaneouslySMT vs. Superscalar

Superscalar unable to exploit TLPSMT vs. Fine-grained multithreading

F.G. eliminated only vertical wasteSMT vs. Multiprocessors

Multiprocessors limited by static resource partitioningHurrah! SMTs performed the best ..

Page 15: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 15

SMT Issues – what to fetch

StaticRound-robin8 instructions from one thread or4 instructions from two threads or2 instructions from four threads etc.

DynamicFavour threads with minimal in-flight branchesFavour threads with minimal outstanding missesFavour threads with minimal in-flight instructions

Page 16: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 16

SMT Issues – what to issue

Oldest firstCache hit speculated lastBranch speculated firstBranches first

Important result: Unlike superscalar, doesn’t matter much!

Page 17: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 17

SMT Issues – Caching

Same cache shared among threadsNo coherence issuesBut, cache conflicts increasePossibility of cache thrashing

Page 18: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 18

SMT Issues – Synchronization

Spinlocks not useful (in fact, bad!)Synchronization mechanism needs to be fast, light,scalableSuggested Method (memory based)

acquire(lock):blocks on failureonly completes execution on success

release(lock):writes zero if no other thread blockingelse unblocks the other thread

Page 19: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 19

SMT Issues – Compiler optimizations

Compiler should try to minimize cache interference bymultiple threads in the same programLatency hiding techniques like speculation fromuniprocessor environments need to be rethoughtSharing optimization techniques from multiprocessorschange, since data sharing is now good

Page 20: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 20

Applications

Biggest application: servers!E.g., server running ApacheUsed by Sun, IBM in high-end servers

Page 21: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 21

Commercial SMTs

Compaq Alpha 21464Planned 4T processorAxed in 2001

Pentium IV Xeon2T processorHyperthreading = Intel buzzword for SMT

Sun Ultrasparc IV2T processorAlso a CMP (chip multicore processor)

Page 22: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 22

Conclusion

Simple design extension to existing processortechnologyExploits ILP and TLP without sacrificing single threadperformanceOptimized compiler and operating system support wouldimprove performance

Incidentally, Intel has announced plans for a multi-core SMTprocessor.

Page 23: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 23

References

Simultaneous Multithreading: A Platform forNext-generation Processors

Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., &Tullsen, D.Tuning Compiler Optimizations for SimultaneousMultithreading

Lo, J., Eggers, S., Levy, H., Parekh, S., & Tullsen, D.Supporting Fine-Grain Synchronization on aSimultaneous Multithreaded Processor

Tullsen, D., Lo, J., Eggers, S., & Levy, H.http://lapwww.epfl.ch/courses/advcomparch/

Prof. Paolo Ienne

Page 24: Simultaneous Multithreadingcse.yeditepe.edu.tr/~gkucuk/courses/cse533/pres3-SMT.pdf · Simultaneous Multithreading – p. 12 Instruction Fetch Unit Takes advantage of inter-thread

Simultaneous Multithreading – p. 24

Questions?

Thank You!