cs 61c: great ideas in computer architecture lecture 13: pipeliningcs61c/fa17/lec/13/l13 pipelining...

CS61C:GreatIdeasinComputerArchitecture

Lecture13:Pipelining

KrsteAsanović &RandyKatz

http://inst.eecs.berkeley.edu/~cs61c/fa17

Agenda• RISC-V Pipeline• PipelineControl• Hazards

− Structural− Data

§ R-typeinstructions§ Load

− Control• SuperscalarprocessorsCS61c Lecture13:Pipelining 2

Recap:PipeliningwithRISC-V

CS61c 3

addt0,t1,t2

ort3,t4,t5

sll t6,t0,t3tcycle

instructionsequence

tinstruction

Single Cycle PipeliningTiming tstep =100…200ps tcycle =200ps

Register accessonly100ps AllcyclessamelengthInstruction time,tinstruction =tcycle =800ps 1000psClockrate,fs 1/800ps =1.25GHz 1/200 ps =5GHzRelativespeed 1x 4x

RISC-VPipelineaddt0,t1,t2

ort3,t4,t5

slt t6,t0,t3

tcycle=200ps

instructionsequence

tinstruction =1000ps

sw t0,4(t3)

lw t0,8(t3)

addi t2,t2,1

Resourceuseofinstructionovertime

Resourceuseinaparticulartimeslot

CS61c Lecture13:Pipelining 4

Single-CycleRISC-VRV32IDatapath

CS61c 5

IMEMALU

Imm.Gen

DMEMBranchComp.

AddrAAddrB

DataAAddrD

DataWDataR

0 pc01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

Reg[rs1]

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

PipeliningRISC-VRV32IDatapath

CS61c 6

IMEMALU

Imm.Gen

DMEMBranchComp.

AddrAAddrB

DataAAddrD

DataWDataR

0 pc01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

Reg[rs1]

imm[31:0]

Reg[rs2]

InstructionFetch(F)

InstructionDecode/RegisterRead

ALUExecute(X)

MemoryAccess(M)

WriteBack(W)

PipelinedRISC-VRV32IDatapath

CS61c7

DMEMBranchComp.

DataAAddrD

DataWDataR

+4pcDpcFpcX pcM

rs2MimmXImm.

RecalculatePC+4inMstagetoavoidsendingbothPCandPC+4downpipeline

instM instW

Mustpipelineinstructionalongwithdata,socontroloperatescorrectlyineachstage

Eachstageoperatesondifferentinstruction

CS61c8

DMEMBranchComp.

DataAAddrD

DataWDataR

+4pcDpcFpcX pcM

rs2MimmXImm.instM instW

addt0,t1,t2

ort3,t4,t5slt t6,t0,t3sw t0,4(t3)lw t0,8(t3)

Pipelineregistersseparatestages,holddataforeachinstructioninflight

PipelinedControl• Controlsignalsderivedfrominstruction

− Asinsingle-cycleimplementation− Informationisstoredinpipelineregistersforusebylaterstages

CS61c 10

HazardsAhead

StructuralHazard• Problem:Twoormoreinstructionsinthepipelinecompeteforaccesstoasinglephysicalresource• Solution1:Instructionstakeitinturnstouseresource,someinstructionshavetostall• Solution2:Addmorehardwaretomachine• Canalwayssolveastructuralhazardbyaddingmorehardware

Regfile StructuralHazards• Eachinstruction:

− canreaduptotwooperandsindecodestage− canwriteonevalueinwriteback stage

• Avoidstructuralhazardbyhavingseparate“ports”− twoindependentreadportsandoneindependentwriteport

• Threeaccessespercyclecanhappensimultaneously

StructuralHazard:MemoryAccess

addt0,t1,t2

ort3,t4,t5

slt t6,t0,t3

instructionsequence

sw t0,4(t3)

lw t0,8(t3)

• Instructionanddatamemoryusedsimultaneously

ü Usetwoseparatememories

InstructionandDataCaches

16CS61c Lecture13:Pipelining

Processor

Control

DatapathPC

RegistersArithmetic&LogicUnit

Memory(DRAM)

Program

InstructionCache

DataCache

Caches:smallandfast“buffer”memories

StructuralHazards– Summary• Conflictforuseofaresource• InRISC-Vpipelinewithasinglememory

− Load/storerequiresdataaccess− Withoutseparatememories,instructionfetchwouldhavetostallforthatcycle§ Allotheroperationsinpipelinewouldhavetowait

• Pipelineddatapaths requireseparateinstruction/datamemories− Orseparateinstruction/datacaches

• RISCISAs(includingRISC-V)designedtoavoidstructuralhazards− e.g.atmostonememoryaccess/instruction

DataHazard:RegisterAccess

addt0,t1,t2

ort3,t4,t5

slt t6,t0,t3

instructionsequence

sw t0,4(t3)

lw t0,8(t3)

• Separateports,butwhatifwritetosamevalueasread?• Doessw intheexamplefetchtheoldornewvalue?

RegisterAccessPolicy

addt0,t1,t2

ort3,t4,t5

slt t6,t0,t3

instructionsequence sw t0,4(t3)

lw t0,8(t3)

• Exploithighspeedofregisterfile(100ps)

1) WBupdatesvalue2) IDreadsnewvalue

• Indicatedindiagrambyshading

Mightnotalwaysbepossibletowritethenreadinsamecycle,especiallyinhigh-frequencydesigns.Checkassumptionsinanyquestion.

DataHazard:ALUResult

adds0,t0,t1

subt2,s0,t0

ort6,s0,t3

instructionsequence

xor t5,t1,s0

sw s0,8(t3)

5 5 5 5 5/9 9 9 9 9Valueofs0

Withoutsomefix,sub andor willcalculatewrongresult!CS61c Lecture13:Pipelining 21

DataHazard:ALUResult

adds0,t1,t2

subt2,s0,t5

ort6,s0,t3

instructionsequence

xor t5,t1,s0

sw s0,8(t3)

5Valueofs0

Withoutsomefix,sub andor willcalculatewrongresult!CS61c Lecture13:Pipelining 22

Solution1: Stalling• Problem:Instructiondependsonresultfrompreviousinstruction

− add s0,t0,t1sub t2,s0,t3

• Bubble:− effectivelyNOP:affectedpipelinestagesdo“nothing”

StallsandPerformance

• Stallsreduceperformance− Butstallsarerequiredtogetcorrectresults

• Compilercanarrangecodetoavoidhazardsandstalls− Requiresknowledgeofthepipelinestructure

CS61c 24

Solution2:Forwarding

addt0,t1,t2

ort3,t0,t5

subt6,t0,t3

instructionsequence

xor t5,t1,t0

sw t0,8(t3)

5 5 5 5 5/9 9 9 9 9Valueoft0

Forwarding:graboperandfrompipelinestage,ratherthanregisterfileCS61c 25

Forwarding(akaBypassing)• Useresultwhenitiscomputed

− Don’twaitforittobestoredinaregister− Requiresextraconnectionsinthedatapath

CS61c 26Lecture13:Pipelining

1)DetectNeedforForwarding(example)

addt0,t1,t2

ort3,t0,t5

subt6,t0,t3

X M WD

instX.rd

instD.rs1

CS61c 27

Comparedestinationofolderinstructionsinpipelinewithsourcesofnewinstructionindecodestage.Mustignorewritestox0!

ForwardingPath

CS61c28

DMEMBranchComp.

DataAAddrD

DataWDataR

+4pcDpcFpcX pcM

rs2MimmXImm.instM instW

ForwardingControlLogic

Administrivia

• Project1Part2duenextMonday• ProjectPartythisWednesday7-9pminCory293

• HW3willbereleasedbyFriday• Midterm1regradesduetonight• GuerrillaSessiontonight7-9pminCory293

LoadDataHazard

1cyclestallunavoidable

CS61c 31

forward

unaffected

StallPipeline

CS61c 32

repeatandinstructionandforward

lw DataHazard• Slotafteraloadiscalledaloaddelayslot

− Ifthatinstructionusestheresultoftheload,thenthehardwarewillstallforonecycle

− Equivalenttoinsertinganexplicitnop intheslot§ exceptthelatterusesmorecodespace

− Performanceloss• Idea:

− Putunrelatedinstructionintoloaddelayslot− Noperformanceloss!

CodeSchedulingtoAvoidStalls• Reordercodetoavoiduseofloadresultinthenextinstruction!• RISC-VcodeforD=A+B; E=A+C;

Original Order:lw t1, 0(t0)

lw t2, 4(t0)

add t3, t1, t2

sw t3, 12(t0)

lw t4, 8(t0)

add t5, t1, t4

sw t5, 16(t0)

Alternative:lw t1, 0(t0)

lw t2, 4(t0)

lw t4, 8(t0)

add t3, t1, t2

sw t3, 12(t0)

add t5, t1, t4

sw t5, 16(t0)

Stall!

13cycles11cyclesCS61c Lecture13:Pipelining

ControlHazards

beq t0,t1,label

subt2,s0,t5

ort6,s0,t3

xor t5,t1,s0

sw s0,8(t3)

executedregardlessofbranchoutcome!

executedregardlessofbranchoutcome!!!

PCupdatedreflectingbranchoutcome

Observation• Ifbranchnottaken,theninstructionsfetchedsequentiallyafterbrancharecorrect• Ifbranchorjumptaken,thenneedtoflushincorrectinstructionsfrompipelinebyconvertingtoNOPs

KillInstructionsafterBranchifTaken

beq t0,t1,label

subt2,s0,t5

ort6,s0,t3

label:xxxxxx PCupdatedreflectingbranchoutcome

Takenbranch

ConverttoNOP

ReducingBranchPenalties• Everytakenbranchinsimplepipelinecosts2deadcycles• Toimproveperformance,use“branchprediction”toguesswhichwaybranchwillgoearlierinpipeline• Onlyflushpipelineifbranchpredictionwasincorrect

BranchPrediction

beq t0,t1,label

label:…..

Takenbranch

GuessnextPC!

Checkguesscorrect

IncreasingProcessorPerformance1. Clockrate

− Limitedbytechnologyandpowerdissipation2. Pipelining

− “Overlap”instructionexecution− Deeperpipeline:5=>10=>15stages

§ Less work perstageà shorter clock cycle§ Butmorepotentialforhazards (CPI>1)

3. Multi-issue”super-scalar”processor− Multipleexecutionunits(ALUs)

§ Severalinstructionsexecutedsimultaneously§ CPI<1(ideally)

SuperscalarProcessor

P&Hp.340

Benchmark:CPIofIntelCorei7

P&Hp.350

InConclusion• Pipeliningincreasesthroughputbyoverlappingexecutionofmultipleinstructions• Allpipelinestageshavesameduration

− Choosepartitionthataccommodatesthisconstraint• Hazardspotentiallylimitperformance

− Maximizingperformancerequiresprogrammer/compilerassistance− E.g.LoadandBranchdelayslots

• Superscalarprocessorsusemultipleexecutionunitsforadditionalinstructionlevelparallelism− Performancebenefithighlycodedependent

ExtraSlides

PipeliningandISADesign• RISC-VISAdesignedforpipelining

− Allinstructionsare32-bits§ Easytofetchanddecodeinonecycle§ Versusx86:1- to15-byteinstructions

− Fewandregularinstructionformats§ Decodeandreadregistersinonestep

− Load/storeaddressing§ Calculateaddressin3rd stage,accessmemoryin4th stage

− Alignmentofmemoryoperands§ Memoryaccesstakesonlyonecycle

CS61c 47

SuperscalarProcessor• Multipleissue“superscalar”

− ReplicatepipelinestagesÞmultiplepipelines− Startmultipleinstructionsperclockcycle− CPI<1,souseInstructionsPerCycle(IPC)− E.g.,4GHz4-waymultiple-issue

§ 16BIPS,peakCPI=0.25,peakIPC=4− Dependenciesreducethisinpractice

• “Out-of-Order”execution− Reorderinstructionsdynamicallyinhardwaretoreduceimpactofhazards

• CS152discussesthesetechniques!CS61c Lecture13:Pipelining 48

cs 61c: great ideas in computer architecture lecture 13: pipeliningcs61c/fa17/lec/13/l13 pipelining...

Documents

cs 61c: great ideas in computer architecture functional...

cs 61c: great ideas in computer architecture lecture 11:...

cs 61c: great ideas in computer architecture...

cs 61c great ideas in computer architecture spring...

cs 61c: great ideas in computer architecture (machine...

cs 61c: great ideas in computer architecture lecture 12

cs 61c: great ideas in computer architecture...

cs 61c: great ideas in computer architecture (machine...

cs 61c: great ideas in computer...

cs 61c: great ideas in computer architecture (machine...

cs 61c: great ideas in...

cs 61c: great ideas in computer architecture lecture 13:...

cs 61c: great ideas in computer architecture (a.k.a

cs 61c l25 vm iii (1) - university of california,...

ece/cs 552: pipelining (part...

cs 61c: great ideas in computer architecture pipelining &...

cs 61c: great ideas in computer architecture (machine...

instructor: justin hsia cs 61c: great ideas in computer...

cs 61c: great ideas in computer architecture (machine...

cs 61c: great ideas in computer architecture running a