cs 152 computer architecture and engineering lecture...
Post on 14-May-2018
213 Views
Preview:
TRANSCRIPT
8/30/16 CS152,Fall2016
CS152ComputerArchitectureandEngineering
Lecture2- SimpleMachineImplementations
JohnWawrzynekElectricalEngineeringandComputerSciences
UniversityofCaliforniaatBerkeley
http://www.eecs.berkeley.edu/~johnwhttp://inst.eecs.berkeley.edu/~cs152
8/30/16 CS152,Fall2016
LastTimeinLecture1§ ComputerArchitecture>>ISAs andRTL
– CS152isaboutinteractionofhardwareandsoftware,anddesignofappropriateabstractionlayers
§ TechnologyandApplicationsshapeComputerArchitecture– Historyprovideslessonsforthefuture
§ First130yearsofCompArch,fromBabbagetoIBM360– Movefromcalculators(noconditionals)tofullyprogrammablemachines– RapidchangestartedinWWII(mid-1940s),movefromelectro-mechanicaltopureelectronicprocessors
§ Costofsoftwaredevelopmentbecomesalargeconstraintonarchitecture(needcompatibility)
§ IBM360introducesnotionof“familyofmachines”runningsameISAbutverydifferentimplementations– Sixdifferentmachinesreleasedonsameday(April7,1964)– “Future-proofing”forsubsequentgenerationsofmachine
2
8/30/16 CS152,Fall2016
IBM360:InitialImplementations
3
Model30 ... Model70Memory 8K- 64KB 256K- 512KBDatapath 8-bit 64-bitCircuitDelay 30nsec/level 5nsec/levelLocalStore MainStore TransistorRegistersControlStore Readonly1usec Conventionalcircuits
IBM360instructionsetarchitecture(ISA)completelyhidtheunderlyingtechnologicaldifferencesbetweenvariousmodels.Milestone:ThefirsttrueISAdesignedasportablehardware-softwareinterface!
8/30/16 CS152,Fall2016 4
IBM360SurvivesToday:z12MainframeProcessor
[FromIBMHotChips24presentation,August28,2012]
6Cores@5.5GHz
Special-purposecoprocessorsoneachcore
32nmSOITechnology2.75billiontransistors23.7mmx 25.2mm15layersofmetal7.68milesofwiring!10,000powerpins(!)1,071I/Opins
48MBofLevel-3cacheonchip
8/30/16 CS152,Fall2016
InstructionSetArchitecture(ISA)
§ Thecontractbetweensoftwareandhardware§ Typicallydescribedbygivingalltheprogrammer-visiblestate(registers+memory)plusthesemanticsoftheinstructionsthatoperateonthatstate
§ IBM360wasfirstlineofmachinestoseparateISAfromimplementation(aka.microarchitecture)
§ ManyimplementationspossibleforagivenISA– E.g.,theSovietsbuildcode-compatibleclonesoftheIBM360,asdidAmdahlafterheleftIBM.
– E.g.2.,todayyoucanbuyAMDorIntelprocessorsthatrunthex86-64ISA.– E.g.3:manycellphones usetheARMISAwithimplementationsfrommanydifferentcompaniesincludingTI,Qualcomm,Samsung,Marvell,etc.
5
8/30/16 CS152,Fall2016
ISAtoMicroarchitectureMapping
§ ISAoftendesignedwithparticularmicroarchitectural styleinmind,e.g.,– Accumulator ⇒ hardwired,unpipelined– CISC ⇒microcoded– RISC ⇒ hardwired,pipelined– VLIW ⇒ fixed-latencyin-order parallelpipelines– JVM ⇒ softwareinterpretation
§ Butcanbeimplementedwithanymicroarchitectural style– IntelIvyBridge:hardwiredpipelinedCISC(x86)
machine(withsomemicrocodesupport)– Simics:Software-interpretedSPARCRISCmachine– ARMJazelle:AhardwareJVMprocessor– Thislecture:amicrocoded RISC-Vmachine
6
8/30/16 CS152,Fall2016
Today,Microprogramming
§ToshowhowtobuildverysmallprocessorswithcomplexISAs§TohelpyouunderstandwhereCISC*machinescamefrom§Because stillusedin commonmachines(IBM360,x86,PowerPC)§Asagentleintroductionintomachinestructures§TohelpunderstandhowtechnologydrovethemovetoRISC*
*“CISC”/”RISC”namesmuchnewerthanstyleofmachinestheyreferto.
7
8/30/16 CS152,Fall2016
Microarchitecture: ImplementationofanISA
8
Structure: Howcomponentsareconnected.Static
Behavior: HowdatamovesbetweencomponentsDynamic
Controller
Datapath
ControlPointsStatus
lines
8/30/16 CS152,Fall2016
Microcontrol UnitMauriceWilkes,1954
9
Embedthecontrollogicstatetableinamemoryarray
FirstusedinEDSAC-2,completed1958
MatrixA MatrixB
Decoder
Next state
opconditionalcodeflip-flop
µaddress
ControllinestoALU,MUXs,Registers
Memory
8/30/16 CS152,Fall2016
Microcoded Microarchitecture
10
Memory(RAM)
Datapath
µcontroller(ROM)
AddrData
zero?busy?
opcode
enMemMemWrt
holds fixedmicrocode instructions
holds user program written in macrocode
instructions (e.g., x86, RISC-V, etc.)
8/30/16 CS152,Fall2016
RISC-VISA§ NewRISCdesignfromUCBerkeley§ Realistic&completeISA,butopen&small§ Notover-architectedforacertainimplementationstyle§ Both32-bitand64-bitaddressspacevariants
– RV32andRV64
§ Designedformultiprocessing§ Efficientinstructionencoding§ Easytosubset/extendforeducation/research§ Tech.reportwithRISC-Vspecavailableonclasswebsite
§ We’llbeusing32-bitRISC-Vthissemesterinlecturesandlabs,verysimilartoMIPSyousawinCS61C
11
8/30/16 CS152,Fall2016
RV32ProcessorState
12
Programcounter(pc)
32x32-bit integerregisters(x0-x31)• x0alwayscontainsa0
32floating-point(FP)registers(f0-f31)• eachcancontainasingle- ordouble-precisionFPvalue(32-bitor64-bitIEEEFP)
FPstatusregister(fsr),usedforFProundingmode&exceptionreporting
8/30/16 CS152,Fall2016
RISC-VInstructionEncoding
§ Cansupportvariable-lengthinstructions.§ Baseinstructionset(RV32)alwayshasfixed32-bitinstructionslowesttwobits=112
§ Allbranchesandjumpshavetargetsat16-bitgranularity(eveninbaseISAwhereallinstructionsarefixed32bits)
13
8/30/16 CS152,Fall2016
RISC-VInstructionFormats
14
DestinationReg. Reg.
Source1
Reg.Source2
7-bitopcodefield(butlow2bits=112)
Additionalopcodebits/immediate
8/30/16 CS152,Fall2016
R-Type/I-Type/R4-TypeFormats
15
Reg.Source3
12-bitsignedimmediate
Reg-Reg ALUoperations
Reg-Imm ALUoperationsLoadinstructions,(rs1+immediate)addressing
Onlyusedforfloating-pointfusedmultiply-add
8/30/16 CS152,Fall2016
B-Type
16
12-bitsignedimmediatesplitacrosstwofields
Branches,comparetworegisters,PC+(immediate<<1)target
(Branchesdonothavedelayslot)
Storeinstructions,(rs1+immediate)addressing,rs2data
8/30/16 CS152,Fall2016
L-Type
17
Writes20-bitimmediatetotopofdestinationregister.
Usedtobuildlargeimmediates.
12-bitimmediates aresigned,sohavetoaccountforsignwhenbuilding32-bitimmediates in2-instructionsequence(LUIhigh-20b,ADDIlow-12b)
8/30/16 CS152,Fall2016
J-Type
18
“J”Unconditionaljump,PC+offset target
“JAL”Jumpandlink,alsowritesPC+4tox1
Offsetscaledby1-bitleftshift– canjumpto16-bitinstructionboundary(Sameforbranches)
8/30/16 CS152,Fall2016
ABus-basedDatapath forRISC-V
20
Microinstruction:registertoregistertransfer(17controlsignals+clock)MA ←PC meansRegSel =PC;enReg=yes;ldMA=yes
B ←Reg[rs2]means
enMem
MA
addr
data
ldMA
Memory
busy
MemWrt
Bus 32
zero?
A B
ALUOp ldA ldB
ALU
enALU
RegWrtenReg
addr
data
rs1rs2rd32(PC)1(RA)
RegSel
32GPRs+PC...
32-bitReg
3
rs1rs2rd
ImmSel
IR
Opcode
ldIR
ImmedSelect
enImm
2
RegSel =rs2;enReg=yes;ldB =yes
8/30/16 CS152,Fall2016
MemoryModule
21
Assumption:MemoryoperatesindependentlyandisslowascomparedtoReg-to-Regtransfers(multipleCPUclockcyclesperaccess)
EnableWrite(1)/Read(0)RAM
din dout
we
addr busy
bus
8/30/16 CS152,Fall2016
InstructionExecution
22
Executionofa RISC-Vinstructioninvolves:
1.instructionfetch2.decodeandregisterfetch3.ALUoperation4.memoryoperation(optional)5.writebacktoregisterfile(optional)
+thecomputationofthenextinstructionaddress
8/30/16 CS152,Fall2016
Microprogram Fragments
23
instr fetch: MA,A←PCPC←A+4IR←MemorydispatchonOpcode
canbetreatedasamacro
ALU: A←Reg[rs1]B←Reg[rs2]Reg[rd]←func(A,B)do instructionfetch
ALUi: A←Reg[rs1]B←Imm signextensionReg[rd]←Opcode(A,B)do instructionfetch
8/30/16 CS152,Fall2016
MicroprogramFragments(cont.)
24
LW: A←Reg[rs1]B←ImmMA←A+BReg[rd]←Memorydo instructionfetch
J: A←A - 4GetoriginalPCbackinAB←IRPC←JumpTarg(A,B)do instructionfetch
beq: A←Reg[rs1]B←Reg[rs2]If A==Bthengotobz-takendo instructionfetch
bz-taken: A←PCA←A- 4 GetoriginalPCbackinAB←BImm <<1 BImm =IR[31:27,16:10]PC←A+Bdo instructionfetch
JumpTarg(A,B)={A+(B[31:7]<<1)}
8/30/16 CS152,Fall2016
RISC-VMicrocontroller: firstattemptpureROMimplementation
25
nextstate
Opcodezero?
Busy(memory)
ControlSignals(17)
s
s
7
uProgram ROM
addr
data
uPC (state)
=2(opcode+status+s) words
Howbigis“s”?
ROMsize?
Wordsize?=control+sbits
8/30/16 CS152,Fall2016
MicroprogramintheROM worksheet
27
State Opzero? busyControlpoints next-state
ALU0 ALU * * A←Reg[rs1] ALU1ALU1 ALU * * B←Reg[rs2] ALU2ALU2 ALU * * Reg[rd]←func(A,B) fetch0fetch0 ALU * * MA,A←PC fetch1fetch1 ALU * yes .... fetch1fetch1 ALU * no IR←Memory fetch2fetch2 ALU * * PC←A+4 ?
Nextinstructionsequence
“*”denotesallcombinationspresent
8/30/16 CS152,Fall2016
MicroprogramintheROMCont.
29
StateOp zero?busyControlpoints next-state
ALUi0 ALU * * A←Reg[rs1] ALUi1ALUi1 ALU * * B ←Imm ALUi2ALUi2 ALU * * Reg[rd]←Op(A,B) fetch0...J0 J * * A←A- 4 J1J1 J * * B←IR J2J2 J * * PC←JumpTarg(A,B) fetch0...beq0 beq * * A←Reg[rs1] beq1beq1 beq * * B←Reg[rs2] beq2beq2 beq yes * A←PC beq3beq2 beq no * .... fetch0beq3 beq * * A←A- 4 beq4beq4 beq * * B←BImm beq5beq5 beq * * PC←A+B fetch0...
8/30/16 CS152,Fall2016
SizeofControlStore
31
RISC-V: w=5+2 c=17 s=?no.ofstepsperopcode=~5+fetch-sequence(3)no.ofstates≈
(8stepsperopcode)x(#ofopcodes)x(4statuscombos)=8x25 x4=1024states⇒ s=(10– 7)⇒ width is 20 bitsControlROMsize=1024x20bits≈ 20Kbits
size=2(w+s)x(c+s) ControlROM
data
status&opcode
addr
nextuPC
Controlsignals
uPC/w
/s
/c
8/30/16 CS152,Fall2016
ReducingControlStoreSize
32
• ReducetheROMheight(=addressbits)– reduceinputsbyextraexternallogic
eachinputbitdoublesthesizeofthecontrolstore– reducestatesby groupingopcodes
findcommonsequencesofactions– condenseinputstatusbits
combineallexceptionsintoone,i.e.,exception/no-exception
• ReducetheROMwidth– restrictthenext-stateencoding
Next,Waitformemory,...– encodecontrolsignals(verticalmicrocode)
Controlstorehastobefast⇒ expensive
8/30/16 CS152,Fall2016
RISC-V ControllerV2
33
uJumpType =next| spin| fetch| dispatch| ftrue | ffalse
ControlSignals(17)
ControlROM
address
data
+1
Opcode CL
uPC (state)
jumplogic
zero
uPC uPC+1
absolute
op-group
busy
uPCSrcinputencodingreduces
ROMheight
next-stateencodingreducesROMwidth
uJumpType
8/30/16 CS152,Fall2016
JumpLogic
34
uPCSrc =Case uJumpTypes
next ⇒ uPC+1
spin ⇒ if(busy)thenuPC elseuPC+1
fetch ⇒ absolute
dispatch ⇒ op-group
ftrue ⇒ if(zero)thenabsolute elseuPC+1
ffalse ⇒ if(zero)thenuPC+1 elseabsolute
8/30/16 CS152,Fall2016
InstructionFetch&ALU:RISC-V-Controller-2
35
State Controlpoints next-state
fetch0 MA,A←PCfetch1 IR←Memoryfetch2 PC←A+4...ALU0 A←Reg[rs1]ALU1 B←Reg[rs2]ALU2 Reg[rd]←func(A,B)
ALUi0 A←Reg[rs1]ALUi1 B←ImmALUi2 Reg[rd]←Op(A,B)
nextspindispatch
nextnextfetch
nextnextfetch
8/30/16 CS152,Fall2016
Load&Store: RISC-V-Controller-2
36
State Controlpoints next-state
LW0 A←Reg[rs1] nextLW1 B←Imm nextLW2 MA←A+B nextLW3 Reg[rd]←Memory spinLW4 fetch
SW0 A←Reg[rs1] nextSW1 B←BImm nextSW2 MA←A+B nextSW3 Memory←Reg[rs2] spinSW4 fetch
8/30/16 CS152,Fall2016
Branches: RISC-V-Controller-2
37
State Controlpoints next-state
beq0 A←Reg[rs1] nextbeq1 B←Reg[rs2] nextbeq2 A←PC ffalsebeq3 A←A- 4 nextbeq3 B←BImm<<1 nextbeq4 PC←A+B fetch
8/30/16 CS152,Fall2016
Jumps: RISC-V-Controller-2
38
State Controlpoints next-state
J0 A←A-4 nextJ1 B←IR nextJ2 PC←JumpTarg(A,B) fetch
JR0 A←Reg[rs1] nextJR1 PC←A fetch
JAL0 A←PC nextJAL1 Reg[1]←A nextJAL2 A←A-4 nextJAL3 B←IR nextJAL4 PC←JumpTarg(A,B) fetch
8/30/16 CS152,Fall2016
ImplementingComplexInstructions
40
enMem
MA
addr
data
ldMA
Memory
busy
MemWrt
Bus 32
zero?
A B
ALUOp ldA ldB
ALU
enALU
RegWrtenReg
addr
data
rs1rs2rd32(PC)1(RA)
RegSel
32GPRs+PC...
32-bitReg
3
rs1rs2rd
ImmSel
IR
Opcode
ldIR
ImmedSelect
enImm
2
rd ←M[(rs1)]op(rs2) Reg-Memory-src ALUopM[(rd)]←(rs1)op(rs2) Reg-Memory-dst ALUopM[(rd)]←M[(rs1)]opM[(rs2)] Mem-MemALUop
8/30/16 CS152,Fall2016
Mem-Mem ALUInstructions:RISC-V-Controller-2
41
Mem-MemALUopM[(rd)]←M[(rs1)]opM[(rs2)]
ALUMM0 MA← Reg[rs1] nextALUMM1 A←Memory spinALUMM2 MA←Reg[rs2] nextALUMM3 B←Memory spinALUMM4 MA←Reg[rd] nextALUMM5 Memory←func(A,B) spinALUMM6 fetch
Complexinstructionsusuallydonotrequiredatapath modificationsinamicroprogrammed implementation
-- onlyextraspaceforthecontrolprogram
Implementing theseinstructionsusingahardwiredcontrollermightrequiredatapath modifications
8/30/16 CS152,Fall2016
PerformanceIssues
42
Microprogrammed control⇒ multiplecyclesperinstruction
Cycletime?tC >talu-regfile +tuROM
Goodperformance,relativetoasingle-cyclehardwiredimplementation,canbeachieved:• Totalexecutiontime(numberofcycles)
tailoredperinstruction• Eachuop fast:smallROM,simple
transfers
8/30/16 CS152,Fall2016
HorizontalvsVerticalµCode
§ Horizontalµcode haswiderµinstructions– Multipleparalleloperationsperµinstruction– Fewer microcodestepspermacroinstruction– Sparserencoding⇒ morebits
§ Verticalµcode hasnarrowerµinstructions– Typicallyasingledatapath operationperµinstruction– Moremicrocodestepspermacroinstruction– Morecompact⇒ lessbits
§ Nanocoding– Triestocombinebestofhorizontalandverticalµcode
43
#µInstructions
BitsperµInstruction
uCode ROM
8/30/16 CS152,Fall2016
Nanocoding
44
§ MC68000had17-bitµcodecontainingeither10-bitµjumpor9-bitnanoinstruction pointer– Nanoinstructions were68bitswide,decodedtogive196controlsignals
µcodeROM
nanoaddress
µcodenext-state
µaddress
uPC (state)
nanoinstructionROMdata
Exploitsrecurringcontrolsignalpatternsinµcode,e.g.,
ALU0 A←Reg[rs1]...ALUi0 A←Reg[rs1]...
8/30/16 CS152,Fall2016
Microprogramming inIBM360
Onlythefastestmodels(75and95)werehardwired
45
M30 M40 M50 M65Datapathwidth(bits) 8 16 32 64
µinst width(bits) 50 52 85 87
µcodesize(Kµinsts) 4 4 2.75 2.75
µstoretechnology CCROS TCROS BCROS BCROS
µstorecycle(ns) 750 625 500 200
memorycycle(ns) 1500 2500 2000 750
Rentalfee($K/month) 4 7 15 35
8/30/16 CS152,Fall2016
IBMCardCapacitorRead-OnlyStorage
46[IBMJournal,January1961]
PunchedCardwithmetalfilm
Fixedsensingplates
8/30/16 CS152,Fall2016
MicroprogrammingthrivedintheSeventies
§ SignificantlyfasterROMsthanDRAMswereavailable§ Forcomplexinstructionsets,datapathandcontrollerwere
cheaperandsimpler§ Newinstructions ,e.g.,floatingpoint,couldbesupported
withoutdatapathmodifications§ Fixingbugs inthecontrollerwaseasier§ ISAcompatibilityacrossvariousmodelscouldbeachieved
easilyandcheaply
48
Exceptforthecheapestandfastestmachines,allcomputersweremicroprogrammed
8/30/16 CS152,Fall2016
WritableControlStore(WCS)§ ImplementcontrolstoreinRAMnot ROM
– MOSSRAMmemoriesnowalmostasfastascontrolstore(corememories/DRAMswere2-10xslower)
– Bug-freemicroprograms difficulttowrite
§ User-WCSprovidedasoptiononseveralminicomputers– Alloweduserstochangemicrocodeforeachprocessor
§ User-WCSfailed– Littleornoprogrammingtoolssupport– Difficulttofitsoftwareintosmallspace– MicrocodecontroltailoredtooriginalISA,lessusefulforothers– LargeWCSpartofprocessorstate- expensivecontext switches– Protectiondifficultifusercanchangemicrocode– Virtualmemoryrequiredrestartable microcode
49
8/30/16 CS152,Fall2016
Microprogramming isfarfromextinct
§ PlayedacrucialroleinmicrosoftheEighties• DECuVAX,Motorola68Kseries,Intel286/386
§ Playsanassistingroleinmostmodernmicros– e.g.,AMDBulldozer,IntelIvyBridge,IntelAtom,IBMPowerPC,…– Mostinstructionsexecuteddirectly,i.e.,withhard-wiredcontrol– Infrequently-usedand/orcomplicatedinstructionsinvokemicrocode
§ Patchablemicrocodecommonforpost-fabricationbugfixes,e.g.Intelprocessorsloadµcodepatchesatbootup
50
top related