cs 110 computer architecture review for midterm i · midterm i •switch cell phones off! (not...
Post on 16-Mar-2020
3 Views
Preview:
TRANSCRIPT
CS110ComputerArchitectureReviewforMidtermI
Instructor:SörenSchwertfeger
http://shtech.org/courses/ca/
School of Information Science and Technology SIST
ShanghaiTech University
1Slides based on UC Berkley's CS61C
MidtermI
• Date:Tuesday,Apr.11• Time:10:15- 12:15(normallectureslot)• Venue:TeachingCenter201+203• Oneemptyseatbetweenstudents• Closedbook:– Youcanbringone A4pagewithnotes(bothsides;Englishpreferred;ChineseisOK):WriteyourChineseandPinyin nameonthetop!
– YouwillbeprovidedwiththeMIPS”greensheet”– Noothermaterialallowed!
2
MidtermI• Switchcellphonesoff!(notsilentmode– off!)– Puttheminyourbags.
• Bagsunderthetable.Nothingexceptpaper,pen,1drink,1snackonthetable!
• Nootherelectronicdevicesareallowed!– Noearplugs,music,smartwatch…
• AnybodytouchinganyelectronicdevicewillFAILthecourse!
• Anybodyfoundcheating(copyyourneighborsanswers,additionalmaterial,...)willFAIL thecourse!
3
MidtermI
• Askquestionstoday!• DiscussionisQ&Asession– Suggesttopicsforreviewinpiazza!– Nextweekexamplequestions.
• Thisreviewsessiondoesnot/cannotcoverallpossibletopics!
• NoLabnextweek… NoHWnextweek…4
Content
• Maintopics– Numberrepresentation– C–MIPS
• Plusgeneral”ComputerArchitecture”knowledge
• Everythingtilllecture8CALL– includinglecture8
5
Firstfinishlastweekslecture…
6
Hyperthreading
• Duplicateallelementsthatholdthestate(registers)• UsethesameCLblocks• Usemuxes toselectwhichstatetouseeveryclockcycle• =>run2totallyindependentthreads(samememory->sharedmemory!)• Speedup?
– Noobviousspeedup– makeuseofCLblocksincaseofunavailableresources(e.g.waitformemory) 7
instruction
mem
ory
+4
rtrsrd
registers
ALU
Data
mem
ory
imm
1.InstructionFetch
2.Decode/RegisterRead
3.Execute 4.Memory 5.WriteBack
registers
PCPC
IntelNehalemi7• Hyperthreading:
– About5%diearea– Upto30%speedgain
(BUTalso<0%possible)• Pipeline:20-24stages!• Out-of-orderexecution
1. Instructionfetch.2. Instructiondispatchtoaninstructionqueue3. Instruction:Waitinqueueuntilinput
operandsareavailable=>instructioncanleavequeuebeforeearlier,olderinstructions.
4. Theinstructionisissuedtotheappropriatefunctionalunitandexecutedbythatunit.
5. Theresultsarequeued.6. Writetoregisteronlyafterallolder
instructionshavetheirresultswritten.
8
OldSchoolMachineStructures
9
I/OsystemProcessor
CompilerOperatingSystem(MacOSX)
Application(ex:browser)
DigitalDesignCircuitDesign
InstructionSetArchitecture
Datapath&Control
transistors
MemoryHardware
Software Assembler
New-SchoolMachineStructures(It’sabitmorecomplicated!)
• ParallelRequestsAssignedtocomputere.g.,Search“cats”
• ParallelThreadsAssignedtocoree.g.,Lookup,Ads
• ParallelInstructions>1instruction@onetimee.g.,5pipelinedinstructions
• ParallelData>1dataitem@onetimee.g.,Addof4pairsofwords
• HardwaredescriptionsAllgatesfunctioningin
parallelatsametime10
SmartPhone
Warehouse-Scale
Computer
SoftwareHardware
HarnessParallelism&AchieveHighPerformance
LogicGates
Core Core…
Memory(Cache)
Input/Output
Computer
MainMemory
Core
InstructionUnit(s) FunctionalUnit(s)
A3+B3A2+B2A1+B1A0+B0
Project1
Project3
Project2
6GreatIdeasinComputerArchitecture
1. Abstraction(LayersofRepresentation/Interpretation)
2. Moore’sLaw(Designingthroughtrends)3. PrincipleofLocality(MemoryHierarchy)4. Parallelism5. PerformanceMeasurement&Improvement6. DependabilityviaRedundancy
11
#2:Moore’sLaw
12
GordonMooreIntelCofounder
Predicts:2XTransistors/chip
every2years
GreatIdea#3:PrincipleofLocality/MemoryHierarchy
3/30/17 13
GreatIdea#4:Parallelism
14
GreatIdea#5:PerformanceMeasurementandImprovement
• Tuningapplicationtounderlyinghardwaretoexploit:– Locality– Parallelism– Specialhardwarefeatures,likespecializedinstructions(e.g.,matrixmanipulation)
• Latency– Howlongtosettheproblemup– Howmuchfasterdoesitexecuteonceitgetsgoing– Itisallabouttimetofinish
15
GreatIdea#6:DependabilityviaRedundancy
• Redundancysothatafailingpiecedoesn’tmakethewholesystemfail
16
1+1=2 1+1=2 1+1=1
1+1=22of3agree
FAIL!
Increasingtransistordensityreducesthecostofredundancy
KeyConcepts• Insidecomputers,everythingisanumber• Butnumbersusuallystoredwithafixedsize– 8-bitbytes,16-bithalfwords,32-bitwords,64-bitdoublewords,…
• Integerandfloating-pointoperationscanleadtoresultstoobig/smalltostorewithintheirrepresentations:overflow/underflow
17
NumberRepresentation
18
NumberRepresentation
• Valueofi-th digitisd × Baseiwherei startsat0andincreasesfromrighttoleft:
• 12310=110 x 10102 +210 x 10101 +310 x 10100
=1x10010 +2x1010 +3x110=10010 +2010 +310=12310
• Binary(Base2),Hexadecimal(Base16),Decimal(Base10)differentwaystorepresentaninteger– Weuse1two,5ten,10hex tobeclearer
(vs.12,48,510,1016)
19
NumberRepresentation
• Hexadecimaldigits:0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
• FFFhex =15tenx16ten2 +15tenx16ten1 +15tenx16ten0=3840ten +240ten +15ten=4095ten
• 111111111111two =FFFhex =4095ten• Mayputblankseverygroupofbinary,octal,orhexadecimaldigitstomakeiteasiertoparse,likecommasindecimal
20
SignedIntegersandTwo’s-ComplementRepresentation
• SignedintegersinC;want½numbers<0,want½numbers>0,andwantone0
• Two’scomplementtreats0aspositive,so32-bitwordrepresents232integersfrom-231(–2,147,483,648)to231-1(2,147,483,647)– Note:onenegativenumberwithnopositiveversion– Booklistssomeotheroptions,allofwhichareworse– Everycomputerusestwo’scomplementtoday
• Most-significantbit(leftmost)isthesignbit,since0meanspositive(including0),1meansnegative– Bit31ismostsignificant,bit0isleastsignificant
21
Two’s-ComplementIntegers00000000000000000000000000000000two =0ten00000000000000000000000000000001two =1ten00000000000000000000000000000010two =2ten
... ...01111111111111111111111111111101two =2,147,483,645ten01111111111111111111111111111110two =2,147,483,646ten01111111111111111111111111111111two =2,147,483,647ten10000000000000000000000000000000two =–2,147,483,648ten10000000000000000000000000000001two =–2,147,483,647ten10000000000000000000000000000010two =–2,147,483,646ten
... ...11111111111111111111111111111101two =–3ten11111111111111111111111111111110two =–2ten11111111111111111111111111111111two =–1ten
22
SignBit
WaystoMakeTwo’sComplement• ForN-bitword,complementto2tenN
– For4bitnumber3ten=0011two,two’scomplement
(i.e.-3ten)wouldbe
16ten-3ten=13ten or10000two – 0011two =1101two
23
• Hereisaneasierway:– Invertallbitsandadd1
– Computersactuallydoitlikethis,too
0011two
1100two+1two
3ten
1101two
Bitwisecomplement
-3ten
Two’s-ComplementExamples
• Assumeforsimplicity4bitwidth,-8to+7represented
24
00110010
3+25 0101
00111110
3+(-2)
1 10001
01110001
7+1-8 1000Overflow!
11011110
-3+(-2)
-5 11011
10001111
-8+(-1)+7 10111
CarryintoMSB=CarryOutMSB
CarryintoMSB=CarryOutMSB
Overflow!
Overflowwhenmagnitudeofresulttoobigsmalltofitintoresultrepresentation
Carryin=carryfromlesssignificantbitsCarryout=carrytomoresignificantbits
0to+31
-16to+15
-32to+31☐
☐
☐
☐
25
Supposewehada5-bitword.Whatintegerscanberepresentedintwo’scomplement?
0to+31
-16to+15
-32to+31☐
☐
☐
☐
26
Supposewehada5-bitword.Whatintegerscanberepresentedintwo’scomplement?
Processor
Control
Datapath
ComponentsofaComputer
27
PC
Registers
Arithmetic&LogicUnit(ALU)
MemoryInput
Output
Bytes
Enable?Read/Write
Address
WriteData
ReadData
Processor-MemoryInterface I/O-MemoryInterfaces
Program
Data
CProgramming
28
Quiz:Pointersvoid foo(int *x, int *y){ int t;
if ( *x > *y ) { t = *y; *y = *x; *x = t; }}int a=3, b=2, c=1;foo(&a, &b);foo(&b, &c);foo(&a, &b);printf("a=%d b=%d c=%d\n", a, b, c);
29
A:a=3 b=2 c=1B:a=1 b=2 c=3C:a=1 b=3 c=2D:a=3 b=3 c=3E:a=1 b=1 c=1
Resultis:
30
ArraysandPointersintfoo(int array[],
unsigned int size){
…printf(“%d\n”, sizeof(array));
}
intmain(void){
int a[10], b[5];int c[] = {1, 3, 2, 5, 6};… foo(a, 10)… foo(c, 5) …printf(“%d\n”, sizeof(c));
}
Whatdoesthisprint?
Whatdoesthisprint?
8
20
...becausearray isreallyapointer(andapointerisarchitecturedependent,butlikelytobe8onmodernmachines!)
Quiz:int x[] = { 2, 4, 6, 8, 10 };int *p = x;int **pp = &p;(*pp)++;(*(*pp))++;printf("%d\n", *p);
31
Resultis:A:2B:3C:4D:5E:Noneoftheabove
CMemoryManagement
• Program’saddressspacecontains4regions:– stack:localvariablesinside
functions,growsdownward– heap:spacerequestedfor
dynamicdataviamalloc();resizesdynamically,growsupward
– staticdata:variablesdeclaredoutsidefunctions,doesnotgroworshrink.Loadedwhenprogramstarts,canbemodified.
– code:loadedwhenprogramstarts,doesnotchange
code
staticdata
heap
stack~FFFFFFFFhex
~00000000hex
3232
MemoryAddress(32bitsassumedhere)
TheStack• Everytimeafunctioniscalled,anewframe
isallocatedonthestack• Stackframeincludes:
– Returnaddress(whocalledme?)– Arguments– Spaceforlocalvariables
• Stackframescontiguousblocksofmemory;stackpointerindicatesstartofstackframe
• Whenfunctionends,stackframeistossedoffthestack;freesmemoryforfuturestackframes
• We’llcoverdetailslaterforMIPSprocessor fooD frame
fooB frame
fooC frame
fooA frame
StackPointer33
fooA() { fooB(); }fooB() { fooC(); }fooC() { fooD(); }
Question!int x = 2;int result;
int foo(int n){ int y;
if (n <= 0) { printf("End case!\n"); return 0; }else{ y = n + foo(n-x);
return y;}
}result = foo(10);
Rightaftertheprintf executesbutbeforethereturn 0,howmanycopiesofx andy arethereallocatedinmemory?
A:#x=1,#y=1B:#x=1,#y=5C:#x=5,#y=1D:#x=1,#y=6E:#x=6,#y=6
34
FaultyHeapManagement
• Whatiswrongwiththiscode?• Memoryleak!
int foo() {int *value = malloc(sizeof(int));*value = 42;return *value;
}
35
UsingMemoryYouDon’tOwn• Whatiswrongwiththiscode?
int* init_array(int *ptr, int new_size) {ptr = realloc(ptr, new_size*sizeof(int));memset(ptr, 0, new_size*sizeof(int));return ptr;
}
int* fill_fibonacci(int *fib, int size) {int i;init_array(fib, size);/* fib[0] = 0; */ fib[1] = 1;for (i=2; i<size; i++)fib[i] = fib[i-1] + fib[i-2];return fib;
}36
UsingMemoryYouDon’tOwn• Impropermatchedusageofmem handles
int* init_array(int *ptr, int new_size) {ptr = realloc(ptr, new_size*sizeof(int));memset(ptr, 0, new_size*sizeof(int));return ptr;
}
int* fill_fibonacci(int *fib, int size) {int i;/* oops, forgot: fib = */ init_array(fib, size);/* fib[0] = 0; */ fib[1] = 1;for (i=2; i<size; i++)fib[i] = fib[i-1] + fib[i-2];return fib;
}37
Whatifarrayismovedtonewlocation?
Remember:reallocmaymoveentireblock
AndInConclusion,…• Pointersareanabstractionofmachinememoryaddresses
• Pointervariablesareheldinmemory,andpointervaluesarejustnumbersthatcanbemanipulatedbysoftware
• InC,closerelationshipbetweenarraynamesandpointers
• Pointersknowthetypeoftheobjecttheypointto(exceptvoid*)
• Pointersarepowerfulbutpotentiallydangerous
38
AndInConclusion,…
• Chasthreemainmemorysegmentsinwhichtoallocatedata:– StaticData:Variablesoutsidefunctions– Stack:Variableslocaltofunction– Heap:Objectsexplicitlymalloc-ed/free-d.
• HeapdataisbiggestsourceofbugsinCcode
39
IntheNews… IntelHyper-Scale
40
Intel’sMoores Lawinterpretation:Costpertransistorhalvesevery2years
41
Hyperscaling
42
Multiplediesononecarrier
43
MIPS
44
AdditionandSubtractionofIntegersExample1
• HowtodothefollowingCstatement?a=b+c+d- e;b→$s1;c→ $s2;d→ $s3;e→ $s4;a→ $s0
• Breakintomultipleinstructionsadd $t0, $s1, $s2 # temp = b + cadd $t0, $t0, $s3 # temp = temp + dsub $s0, $t0, $s4 # a = temp - e
• AsinglelineofCmaybreakupintoseverallinesofMIPS.• Noticetheuseoftemporaryregisters– don’twanttomodifythevariableregisters$s• Everythingafterthehashmarkoneachlineisignored(comments)
45
a=((b+c)+d)- e;
Overflow handling in MIPS• Somelanguagesdetectoverflow(Ada),somedon’t(mostCimplementations)•MIPSsolutionis2kindsofarithmeticinstructions:– Thesecauseoverflowtobedetected
• add(add)• addimmediate(addi)• subtract(sub)
– Thesedonotcauseoverflowdetection• addunsigned(addu)• addimmediateunsigned(addiu)• subtractunsigned(subu)
• Compilerselectsappropriatearithmetic–MIPSCcompilersproduceaddu,addiu,subu
46
Question:We want to translate *x = *y +1 into MIPS(x, y int pointers stored in: $s0 $s1)
A: addi $s0,$s1,1
B: lw $s0,1($s1)sw $s1,0($s0)
C: lw $t0,0($s1)addi $t0,$t0,1sw $t0,0($s0)
D: sw $t0,0($s1)addi $t0,$t0,1lw $t0,0($s0)
E: lw $s0,1($t0)sw $s1,0($t0)
47
Processor
Control
Datapath
ExecutingaProgram
48
PC
Registers
Arithmetic&LogicUnit(ALU)
Memory
BytesInstructionAddress
ReadInstructionBits
Program
Data
• ThePC(programcounter)isinternalregisterinsideprocessorholdingbyteaddressofnextinstructiontobeexecuted.
• Instructionisfetchedfrommemory,thencontrolunitexecutesinstructionusingdatapath andmemorysystem,andupdatesprogramcounter(defaultisadd+4bytestoPC,tomovetonextsequentialinstruction)
Question!
Whatisthecodeabove?A: whileloopB: do…whileloopC: forloopD: AorCE: Notaloop
addi $s0,$zero,0Start: slt $t0,$s0,$s1
beq $t0,$zero,Exitsll $t1,$s0,2addu $t1,$t1,$s5lw $t1,0($t1) add $s4,$s4,$t1addi $s0,$s0,1j Start
Exit:
49
MIPSFunctionCallConventions
• Registersfasterthanmemory,sousethem• $a0–$a3:fourargumentregisterstopassparameters($4- $7)
• $v0,$v1:twovalueregisterstoreturnvalues($2,$3)
• $ra:onereturnaddressregistertoreturntothepointoforigin($31)
50
InstructionSupportforFunctions(1/4)
... sum(a,b);... /* a,b:$s0,$s1 */}int sum(int x, int y) {return x+y;
}address (shown in decimal)1000 1004 1008 1012 1016 …2000 2004
C
MIPS
InMIPS,allinstructionsare4bytes,andstoredinmemoryjustlikedata.Sohereweshowtheaddressesofwheretheprogramsarestored.
51
InstructionSupportforFunctions(2/4)
... sum(a,b);... /* a,b:$s0,$s1 */}int sum(int x, int y) {return x+y;
}address (shown in decimal)1000 add $a0,$s0,$zero # x = a1004 add $a1,$s1,$zero # y = b1008 addi $ra,$zero,1016 # $ra=10161012 j sum # jump to sum1016 … # next instruction…2000 sum: add $v0,$a0,$a12004 jr $ra # new instr. “jump register”
C
MIPS
52
InstructionSupportforFunctions(3/4)
... sum(a,b);... /* a,b:$s0,$s1 */}int sum(int x, int y) {return x+y;
}
2000 sum: add $v0,$a0,$a12004 jr $ra # new instr. “jump register”
• Question:Whyuse jr here?Whynot usej?
• Answer:summightbecalledbymanyplaces,sowecan’treturntoafixedplace.Thecallingproctosummustbeabletosay“returnhere”somehow.
C
MIPS
53
InstructionSupportforFunctions(4/4)• Singleinstructiontojumpandsavereturnaddress:jumpandlink(jal)
• Before:1008 addi $ra,$zero,1016 # $ra=10161012 j sum # goto sum
• After:1008 jal sum # $ra=1012,goto sum
• Whyhaveajal?– Makethecommoncasefast:functioncalls verycommon.– Don’thavetoknowwhere codeis inmemorywithjal!
54
Question
• WhichstatementisFALSE?
55
B: jal savesPC+1in$ra
C: Thecallee canusetemporaryregisters($ti)withoutsavingandrestoringthem
D: Thecallercanrelyonsaveregisters($si)withoutfearofcallee changingthem
A:MIPSusesjal toinvokeafunctionandjr toreturnfromafunction
StackBefore,During,AfterCall
56
BasicStructureofaFunction
entry_label: addi $sp,$sp, -framesizesw $ra, framesize-4($sp) # save $rasave other regs if need be
...
restore other regs if need belw $ra, framesize-4($sp) # restore $raaddi $sp,$sp, framesizejr $ra
Epilogue
Prologue
Body (call other functions…)
ra
memory
57
InstructionFormats
• I-format:usedforinstructionswithimmediates,lw andsw (sinceoffsetcountsasanimmediate),andbranches(beq andbne)– (butnottheshiftinstructions;later)
• J-format:usedforj andjal• R-format:usedforallotherinstructions• Itwillsoonbecomeclearwhytheinstructionshavebeenpartitionedinthisway
58
R-FormatInstructions(1/5)
• Define“fields”ofthefollowingnumberofbitseach:6+5+5+5+5+6=32
• Forsimplicity,eachfieldhasaname:
• Important:Ontheseslidesandinbook,eachfieldisviewedasa5- or6-bitunsignedinteger,notaspartofa32-bitinteger– Consequence:5-bitfieldscanrepresentanynumber0-31,while
6-bitfieldscanrepresentanynumber0-63
6 5 5 5 65
opcode rs rt rd functshamt
59
I-FormatInstructions(2/4)• Define“fields”ofthefollowingnumberofbitseach:6+5+5+16=32bits
– Again,eachfieldhasaname:
– KeyConcept:OnlyonefieldisinconsistentwithR-format.Mostimportantly,opcode isstillinsamelocation.
6 5 5 16
opcode rs rt immediate
60
I-FormatExample(2/2)• MIPSInstruction:
addi $21,$22,-50
8 22 21 -50
001000 10110 10101 1111111111001110
Decimal/field representation:
Binary/field representation:
hexadecimal representation: 22D5 FFCEhex
61
BranchExample(1/2)
• MIPSCode:Loop: beq $9,$0,End
addu $8,$8,$10addiu $9,$9,-1j Loop
End:
• I-Formatfields:opcode =4 (lookuponGreenSheet)rs =9 (firstoperand)rt =0 (secondoperand)immediate =???
62
StartcountingfrominstructionAFTERthebranch
123
3
BranchExample(2/2)
• MIPSCode:Loop: beq $9,$0,End
addu $8,$8,$10addiu $9,$9,-1j Loop
End:
Fieldrepresentation(decimal):
Fieldrepresentation(binary):
63
4 9 0 331 0
000100 01001 00000 000000000000001131 0
J-FormatInstructions(2/4)
• Definetwo“fields”ofthesebitwidths:
• Asusual,eachfieldhasaname:
• KeyConcepts:– Keepopcode fieldidenticaltoR-FormatandI-Formatforconsistency
– Collapseallotherfieldstomakeroomforlargetargetaddress 64
6 2631 0
opcode target address31 0
Summary• I-Format: instructionswithimmediates,lw/sw (offsetisimmediate),andbeq/bne– Butnottheshiftinstructions– BranchesusePC-relativeaddressing
• J-Format: j andjal (butnotjr)– Jumpsuseabsoluteaddressing
• R-Format: allotherinstructions
65
opcode rs rt immediateI:
opcode target addressJ:
opcode functrs rt rd shamtR:
AssemblerPseudo-Instructions• CertainCstatementsareimplementedunintuitivelyinMIPS– e.g.assignment(a=b)viaadd$zero
• MIPShasasetof“pseudo-instructions”tomakeprogrammingeasier– Moreintuitivetoread,butgettranslatedintoactualinstructionslater
• Example:move dst,src
translatedintoaddi dst,src,0
66
MultiplyandDivide• Examplepseudo-instruction:
mul $rd,$rs,$rt– Consistsofmult whichstorestheoutputinspecialhiandloregisters,andamovefromtheseregistersto$rd
mult $rs,$rtmflo $rd
• mult anddiv havenothingimportantintherd fieldsincethedestinationregistersarehi andlo
• mfhi andmflo havenothingimportantinthers andrt fieldssincethesourceisdeterminedbytheinstruction(seeCOD)
67
Question
WhichofthefollowingplacetheaddressofLOOPin$v0?1) la $t1, LOOP
lw $v0, 0($t1)
2) jal LOOPLOOP: addu $v0, $ra, $zero
3) la $v0, LOOP
68
1 2 3A)T, T, TB)T, T, FC)F, T, TD)F, T, FE)F, F, T
StepsincompilingaCprogram§ Compiler converts a single HLL file
into a single assembly language file.§ Assembler removes pseudo-
instructions, converts what it can to machine language, and creates a checklist for the linker (relocation table). A .s file becomes a .o file.ú Does 2 passes to resolve addresses,
handling internal forward references
§ Linker combines several .o files and resolves absolute addresses.ú Enables separate compilation, libraries
that need not be compiled, and resolves remaining addresses
§ Loader loads executable into memory and begins execution.
69
Pseudo-instructionReplacement• AssemblertreatsconvenientvariationsofmachinelanguageinstructionsasifrealinstructionsPseudo: Real:subu $sp,$sp,32 addiu $sp,$sp,-32sd $a0, 32($sp) sw $a0, 32($sp)
sw $a1, 36($sp)mul $t7,$t6,$t5 mult $t6,$t5
mflo $t7addu $t0,$t6,1 addiu $t0,$t6,1ble $t0,100,loop slti $at,$t0,101
bne $at,$0,loopla $a0, str lui $at,left(str)
ori $a0,$at,right(str)
70
QuestionAtwhatpointinprocessareallthemachinecodebitsgeneratedforthefollowingassemblyinstructions:1)addu $6, $7, $82)jal fprintf
A:1)&2)AftercompilationB:1)Aftercompilation,2)AfterassemblyC:1)Afterassembly,2)AfterlinkingD:1)Afterassembly,2)AfterloadingE:1)Aftercompilation,2)Afterlinking
71
top related