instruction level paralleism and super scalar processors
TRANSCRIPT
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 1/23
Instruction Level Parallelism andSuperscalar Processors
CH01
TECHComputer Science
Decode and issue more and one instruction at a timeExecuting more than one instruction at a timeMore than one Execution Unit
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 2/23
Wh at is Superscalar?
Common instructions (arithmetic, load/store,conditional branch) can be initiated and executedindependentlyEqually applicable to RISC & CISC
In practice usually RISC
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 3/23
Why Superscalar?
Most operations are on scalar quantities (see RISCnotes)Improve these operations to get an overallimprovement
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 4/23
G eneral Superscalar Organization
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 5/23
Superpipelined
Many pipeline stages need less than half a clock cycleDouble internal clock speed gets two tasks per external clock cycleSuperscalar allows parallel fetch execute
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 6/23
Superscalar vSuperpipeline
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 7/23
Limitations
Instruction level parallelismCompiler based optimisationHardware techniquesLimited by
T rue data dependenc yProcedural dependenc yR esource conflictsOutput dependenc yAntidependenc y
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 8/23
T rue Data Dependenc y
ADD r1, r2 (r1 := r1+r2;)MOVE r3,r1 (r3 := r1;)Can fetch and decode second instruction in parallelwith firstCan NOT execute second instruction until first isfinished
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 9/23
Procedural Dependenc y
Can not execute instructions after a branch,in parallel with, instructions before a branchAlso, if instruction length is not fixed, instructionshave to be decoded to find out how many fetches are
neededThis prevents simultaneous fetches
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 10/23
R esource Conflict
Two or more instructions requiring access to the sameresource at the same timee.g. two arit hmetic instructions
Can duplicate resources
e.g. have two arit hmetic units
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 11/23
Dependencies
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 12/23
Design Issues
Instruction level parallelismInstructions in a sequence are independentExecution can be overlappedG overned b y data and procedural dependenc y
Machine ParallelismAbilit y to take advantage of instruction level parallelismG overned b y number of parallel pipelines
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 13/23
Instruction Issue Polic y
Order in which instructions are fetchedOrder in which instructions are executedOrder in which instructions change registers andmemory
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 14/23
In-Order IssueIn-Order Completion
Issue instructions in the order they occur Not very efficientMay fetch >1 instructionInstructions must stall if necessary
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 15/23
In-Order IssueIn-Order Completion, e.g.
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 16/23
In-Order IssueOut-of-Order Completion, e.g.
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 17/23
In-Order IssueOut-of-Order Completion
Output dependencyR3:= R3 + R5; (I11)R4:= R3 + 1 ; (I12)R3:= R5 + 1 ; (I1 3)
I12 depends on result of I11 - data dependenc yIf I1 3 completes before I11, t he result from I1 will bewrong - output (read-write) dependenc y
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 18/23
Out-of-Order IssueOut-of-Order Completion
Decouple decode pipeline from execution pipelineCan continue to fetch and decodeuntil this pipeline is fullWhen a functional unit becomes available an
instruction can be executedSince instructions have been decoded,
processor can look ahead
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 19/23
Out-of-Order IssueOut-of-Order Completion e.g.
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 20/23
Antidependenc y
Write-write dependencyR3:=R3 + R5; (I1)R4:=R3 + 1 ; (I2)R3:=R5 + 1 ; (I 3)
R7:=R3 + R4; (I 4)I3 can not complete before I2 starts as I2 needs a valuein R3 and I 3 changes R3
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 21/23
M ac h ine Parallelism
Duplication of ResourcesOut of order issueRenaming
Not worth duplication functions without register renaming
Need instruction window large enough (more than 8)
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 22/23
R ISC - Dela yed Branc h
Calculate result of branch before unusableinstructions pre-fetchedAlways execute single instruction immediatelyfollowing branch
Keeps pipeline full while fetching new instructionstream
Not as good for superscalar M ultiple instructions need to execute in dela y slotInstruction dependence problems
Revert to branch prediction
8/8/2019 Instruction Level Paralleism and Super Scalar Processors
http://slidepdf.com/reader/full/instruction-level-paralleism-and-super-scalar-processors 23/23
Superscalar Implementation
Simultaneously fetch multiple instructionsLogic to determine true dependencies involvingregister valuesMechanisms to communicate these values
Mechanisms to initiate multiple instructions in parallelResources for parallel execution of multipleinstructionsMechanisms for committing process state in correctorder