l15 caches2 - university of california, berkeleycs61c/fa17/lec/15/l15... · 2017. 10. 17. · –4...

Post on 26-Jan-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

  • 10/16/17

    1

    CS61C:GreatIdeasinComputerArchitecture(MachineStructures)

    CachesPart2Instructors:

    Krste Asanović &RandyH.Katzhttp://inst.eecs.berkeley.edu/~cs61c/

    110/16/17 Fall2017 - Lecture#15

    Outline

    • CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

    210/16/17 Fall2017 – Lecture#15

    Outline

    • CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

    310/16/17 Fall2017 – Lecture#15

    Second-LevelCache(SRAM)

    TypicalMemoryHierarchy

    Control

    Datapath

    SecondaryMemory(Disk

    OrFlash)

    On-ChipComponents

    RegFile

    MainMemory(DRAM)Data

    CacheInstrCache

    Speed(cycles):½’s1’s10’s100’s-10001,000,000’s

    Size(bytes): 100’s 10K’sM’sG’sT’s

    • Principleoflocality+memoryhierarchypresentsprogrammerwith≈asmuchmemoryasisavailableinthecheapest technologyatthe≈speedofferedbythefastest technology

    Cost/bit:highestlowest

    Third-LevelCache(SRAM)

    10/16/17 Fall2017 - Lecture#15 4

    Processor

    Control

    Datapath

    AddingCachetoComputer

    PC

    Registers

    Arithmetic&LogicUnit(ALU)

    MemoryInput

    Output

    Bytes

    Enable?Read/Write

    Address

    WriteData

    ReadData

    Processor-MemoryInterface I/O-MemoryInterfaces

    Program

    Data

    Cache

    10/16/17 Fall2017 - Lecture#15 5

    Processororganizedaroundwordsand bytes

    Memory(includingcache)organizedaroundblocks,

    whicharetypicallymultiplewords

    KeyCacheConcepts

    • PrincipleofLocality– TemporalLocalityandSpatialLocality

    • HierarchyofMemories (speed/size/costperbit)toexploitlocality

    • Cache– copyofdatainlowerlevelofmemoryhierarchy• DirectMappedtofindblockincacheusingTagfieldandValid

    bitforHit• CacheDesignOrganizationChoices:– FullyAssociative,Set-Associative,Direct-Mapped

    610/16/17 Fall2017 - Lecture#15

  • 10/16/17

    2

    CacheOrganizations• “FullyAssociative”:Blockplacedanywhereincache– Firstdesignlastlecture– Note:NoIndexfield,butonecomparator/block

    • “DirectMapped”:Blockgoesonlyoneplaceincache– Note:Onlyonecomparator– Numberofsets=numberblocks

    • “N-waySetAssociative”:Nplacesforblockincache– Numberofsets=NumberofBlocks/N– Ncomparators– FullyAssociative:N=numberofblocks– DirectMapped:N=1

    710/16/17 Fall2017 - Lecture#15

    0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

    0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

    0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

    8 88Byte

    Word8-Byte Block

    address address address

    2 LSBs are 0 3 LSBs are 0

    0

    1

    2

    3

    01234567012345670123456701234567

    Byte offset in blockBlock #

    MemoryBlockvs.WordAddressing

    810/16/17 Fall2017 - Lecture#15

    010100100000

    010100110000

    010101000000

    010101010000

    010101100000

    010101110000

    010110000000

    010110010000

    010110100000

    010110110000

    010100100000

    010100110000

    010101000000

    010101010000

    010101100000

    010101110000

    010110000000

    010110010000

    010110100000

    010110110000

    82

    83

    84

    85

    86

    87

    88

    89

    90

    91

    2

    3

    4

    5

    6

    7

    0

    1

    2

    3

    0

    1

    0

    1

    0

    1

    0

    1

    0

    1

    010100100000

    010100110000

    010101000000

    010101010000

    010101100000

    010101110000

    010110000000

    010110010000

    010110100000

    010110110000

    MemoryBlockNumberAliasing

    Block# Block#mod8 Block#mod2

    12-bitmemoryaddresses,16Byteblocks

    10/16/17 Fall2017 - Lecture#15 9

    ProcessorAddressFieldsUsedbyCacheController

    • BlockOffset:Byteaddresswithinblock• SetIndex:Selectswhichset• Tag:Remainingportionofprocessoraddress

    • SizeofIndex=log2(numberofsets)• SizeofTag=Addresssize– SizeofIndex

    – log2(numberofbytes/block)

    BlockoffsetSetIndexTag

    ProcessorAddress(32-bitstotal)

    10/16/17 Fall2017 - Lecture#15 10

    WhatLimitsNumberofSets?

    • Foragiventotalnumberofblocks,wesavecomparatorsifhavemorethantwosets

    • Limit:AsManySetsasCacheBlocks=>onlyoneblockperset–onlyneedsonecomparator!

    • Called“Direct-Mapped”Design

    11

    BlockoffsetIndexTag

    10/16/17 Fall2017 - Lecture#15

    DirectMappedCacheExample:Mappinga6-bitMemoryAddress

    • Inexample,blocksizeis4bytes/1word• Memoryandcacheblocksalwaysthesamesize,unitoftransferbetweenmemoryandcache• #Memoryblocks>>#Cacheblocks

    – 16Memoryblocks=16words=64bytes=>6bitstoaddressallbytes– 4Cacheblocks,4bytes(1word)perblock– 4Memoryblocksmaptoeachcacheblock

    • Memoryblocktocacheblock,akaindex:middletwobits• Whichmemoryblockisinagivencacheblock,akatag:toptwobits

    12

    05 1

    ByteWithinBlock

    ByteOffset

    23

    BlockWithin$

    4

    Mem BlockWithin$Block

    Tag Index

    10/16/17 Fall2017 - Lecture#15

  • 10/16/17

    3

    OneMoreDetail:ValidBit

    • Whenstartanewprogram,cachedoesnothavevalidinformationforthisprogram

    • Needanindicatorwhetherthistagentryisvalidforthisprogram

    • Adda“validbit”tothecachetagentry0=>cachemiss,evenifbychance,address=tag1=>cachehit,ifprocessoraddress=tag

    10/16/17 Fall2017 - Lecture#15 13

    CacheOrganization:SimpleFirstExample

    00011011

    Cache

    MainMemory

    Q:Whereinthecacheisthemem block?

    Usenext2low-ordermemoryaddressbits– theindex– todeterminewhichcacheblock(i.e.,modulothenumberofblocksinthecache)

    Tag Data

    Q:Isthememoryblockincache?Comparethecachetagtothehigh-order2memoryaddressbitstotellifthememoryblockisinthecache(providedvalidbitisset)

    Valid

    0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx

    OnewordblocksTwoloworderbits(xx)definethebyteintheblock(32bwords)Index

    10/16/17 Fall2017 - Lecture#15 14

    Example:Alternativesinan8BlockCache• DirectMapped:8blocks,1way,1tagcomparator,8sets• FullyAssociative:8blocks,8ways,8tagcomparators,1set• 2WaySetAssociative:8blocks,2ways,2tagcomparators,4sets• 4WaySetAssociative:8blocks,4ways,4tagcomparators,2sets

    1510/16/17 Fall2017 - Lecture#15 15

    0

    1

    2

    3

    DM:8sets1way

    4

    5

    6

    7

    0

    1

    2

    3

    FA:1set8ways

    4

    5

    6

    7

    0

    1

    2

    3

    2WaySA:4sets Set0

    Set1

    Set2

    Set3

    4

    5

    6

    7

    0

    1

    2

    3

    4WaySA:2sets

    Set0

    Set1

    4

    5

    6

    7

    • Onewordblocks,cachesize=1Kwords(or4KB)

    Direct-MappedCache

    20Tag 10Index

    DataIndex TagValid012...

    102110221023

    3130...131211...210Byteoffset

    20

    Data

    32

    HitValidbitensures

    somethingusefulincacheforthisindex

    CompareTagwithupperpartofAddresstoseeifa

    Hit

    Readdatafromcache

    insteadofmemoryif

    aHit

    Comparator

    10/16/17 Fall2017 - Lecture#15 16

    PeerInstruction

    • Foracachewithconstanttotalcapacity, ifweincreasethenumberofwaysbyafactoroftwo,whichstatementisfalse:A:ThenumberofsetscouldbedoubledB:ThetagwidthcoulddecreaseC:Theblocksizecouldstaythesame:Theblocksizecouldbehalved

    1710/16/17 Fall2017 - Lecture#15

    Break!

    1810/16/17 Fall2017 - Lecture#15

  • 10/16/17

    4

    Outline

    • CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

    1910/16/17 Fall2017 – Lecture#15

    HandlingStoreswithWrite-Through

    • Storeinstructionswritetomemory,changingvalues• Needtomakesurecacheandmemoryhavesamevaluesonwrites:twopolicies

    1)Write-ThroughPolicy:writecacheandwritethroughthecachetomemory– Everywriteeventuallygetstomemory– Tooslow,soincludeWriteBuffertoallowprocessortocontinueoncedatainBuffer

    – Bufferupdatesmemoryinparalleltoprocessor

    10/16/17 Fall2017 - Lecture#15 20

    Write-ThroughCache

    • Writebothvaluesincacheandinmemory

    • WritebufferstopsCPUfromstallingifmemorycannotkeepup

    • Writebuffermayhavemultipleentriestoabsorbburstsofwrites

    • Whatifstoremissesincache?

    Processor

    32-bitAddress

    32-bitData

    Cache

    32-bitAddress

    32-bitData

    Memory

    1022 99252

    720

    12

    1312041 Addr Data

    WriteBuffer

    10/16/17 Fall2017 - Lecture#15 21

    HandlingStoreswithWrite-Back

    2)Write-BackPolicy:writeonlytocacheandthenwritecacheblockbacktomemorywhenevictblockfromcache–Writescollectedincache,onlysinglewritetomemoryperblock– Includebittoseeifwrotetoblockornot,andthenonlywritebackifbitisset• Called“Dirty”bit(writingmakesit“dirty”)

    10/16/17 Fall2017 - Lecture#15 22

    Write-BackCache

    • Store/cachehit,writedataincacheonlyandsetdirtybit– Memoryhasstalevalue

    • Store/cachemiss,readdatafrommemory,thenupdateandsetdirtybit– “Write-allocate”policy

    • Load/cachehit,usevaluefromcache

    • Onanymiss,writebackevictedblock,onlyifdirty.Updatecachewithnewblockandcleardirtybit

    Processor

    32-bitAddress

    32-bitData

    Cache

    32-bitAddress

    32-bitData

    Memory

    1022 99252

    720

    12

    1312041

    DDDD

    DirtyBits

    10/16/17 Fall2017 - Lecture#15 23

    Write-Throughvs.Write-Back

    • Write-Through:– Simplercontrollogic– Morepredictabletimingsimplifiesprocessorcontrollogic

    – Easiertomakereliable,sincememoryalwayshascopyofdata(bigidea:Redundancy!)

    • Write-Back– Morecomplexcontrollogic– Morevariabletiming(0,1,2memoryaccessespercacheaccess)

    – Usuallyreduceswritetraffic– Hardertomakereliable,sometimescachehasonlycopyofdata

    10/16/17 Fall2017 - Lecture#15 24

  • 10/16/17

    5

    Administrivia• Midterm#22weeksaway!October31!

    – Inclass!8-9:30AM– SynchronousdigitaldesignandProject2(processordesign)included– PipelinesandCaches– ONEDoublesidedCribsheet– ReviewSession:Saturday,Oct28(LocationTBA)

    • 5-10opendrop-inseatsforthesetutoringsessions:• M 1-2Soda611• Th3-4Soda380• F 5-6Soda651

    • GuerrillaSessiontonight7-9pminCory293• Project2-1Partytomorrow7-9pmCory293• IfyouwouldliketochangeyourpartnershipforProject2,emailyourlabTA

    – WewillsendoutaGoogleformtotrackallProject2partnerships

    2510/16/17 Fall2017 - Lecture#15

    Outline

    • CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

    10/16/17 Fall2017 – Lecture#15 26

    Cache(Performance) Terms

    • Hitrate:fractionofaccessesthathitinthecache• Missrate:1– Hitrate• Misspenalty:timetoreplaceablockfromlowerlevelinmemoryhierarchytocache

    • Hittime:timetoaccesscachememory(includingtagcomparison)

    • Abbreviation:“$”=cache(aBerkeleyinnovation!)10/16/17 Fall2017 - Lecture#15 27

    AverageMemoryAccessTime(AMAT)• AverageMemoryAccessTime(AMAT)istheaveragetimetoaccessmemoryconsideringbothhitsandmissesinthecacheAMAT=Timeforahit+Missrate× Misspenalty

    10/16/17 Fall2017 - Lecture#15 28

    PeerInstruction

    AMAT=Timeforahit+MissratexMisspenalty• Givena200psec clock,amisspenaltyof50clockcycles,amissrateof0.02missesperinstructionandacachehittimeof1clockcycle,whatisAMAT?A:≤200psecB:400psecC:600psec: 800psec

    2910/16/17 Fall2017 - Lecture#15

    PingPongCacheExample:Direct-MappedCachew/4Single-WordBlocks,Worst-CaseReferenceString

    0 4 0 4

    0 4 0 4

    • Considerthemainmemoryaddressreferencestringofwordnumbers:04040404

    Startwithanemptycache- allblocksinitiallymarkedasnotvalid

    10/16/17 Fall2017 - Lecture#15 30

  • 10/16/17

    6

    0 4 0 4

    0 4 0 4

    miss miss miss miss

    miss miss miss miss

    00Mem(0) 00Mem(0)01 4

    01Mem(4)000

    00Mem(0)01 4

    00Mem(0)01 4

    00Mem(0)01 4

    01Mem(4)000

    01Mem(4)000

    Startwithanemptycache- allblocksinitiallymarkedasnotvalid

    Ping-pong effectduetoconflictmisses- twomemorylocationsthatmapintothesamecacheblock

    • 8requests,8misses

    • Considerthemainmemoryaddressreferencestringofwordnumbers:04040404

    10/16/17 Fall2017 - Lecture#15 31

    PingPongCacheExample:Direct-MappedCachew/4Single-WordBlocks,Worst-CaseReferenceString Outline

    • CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

    3210/16/17 Fall2017 – Lecture#15

    Example:2-WaySetAssociative$(4words=2setsx2waysperset)

    0

    Cache

    MainMemory

    Q:Howdowefindit?

    Usenext1lowordermemoryaddressbittodeterminewhichcacheset(i.e.,modulothenumberofsetsinthecache)

    Tag Data

    Q:Isitthere?

    Compareall thecachetagsinthesettothehighorder3memoryaddressbits totellifthememoryblockisinthecache

    V

    0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx

    Set

    1

    01

    Way

    0

    1

    OnewordblocksTwoloworderbitsdefinethebyteintheword(32bwords)

    10/16/17 Fall2017 - Lecture#15 33

    PingPongCacheExample:4Word2-WaySA$,SameReferenceString

    0 4 0 4

    • Considerthemainmemorywordreferencestring04040404Startwithanemptycache- allblocks

    initiallymarkedasnotvalid

    10/16/17 Fall2017 - Lecture#15 34

    PingPongCacheExample:4-Word2-WaySA$,SameReferenceString

    0 4 0 4

    • Considerthemainmemoryaddressreferencestring04040404

    miss miss hit hit

    000Mem(0) 000Mem(0)

    Startwithanemptycache- allblocksinitiallymarkedasnotvalid

    010Mem(4) 010Mem(4)

    000Mem(0) 000Mem(0)

    010Mem(4)

    • Solvestheping-pong effectinadirect-mappedcacheduetoconflictmissessincenowtwomemorylocationsthatmapintothesamecachesetcanco-exist!

    • 8requests,2misses

    10/16/17 Fall2017 - Lecture#15 35

    Four-WaySet-AssociativeCache• 28 =256setseachwithfourways(eachwithoneblock)

    3130...131211...210 Byteoffset

    DataTagV012...

    253254255

    DataTagV012...

    253254255

    DataTagV012...

    253254255

    Index DataTagV012...

    253254255

    8Index

    22Tag

    Hit Data

    32

    4x1select

    Way0 Way1 Way2 Way3

    10/16/17 Fall2017 - Lecture#15 36

  • 10/16/17

    7

    Break!

    3710/16/17 Fall2017 - Lecture#15

    RangeofSet-AssociativeCaches• Forafixed-sizecacheandfixedblocksize,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.,thenumberorways)andhalvesthenumberofsets– decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit

    Wordoffset ByteoffsetIndexTag

    10/16/17 Fall2017 - Lecture#15 38

    RangeofSet-AssociativeCaches• Forafixed-sizecacheandfixedblocksize,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.,thenumberorways)andhalvesthenumberofsets– decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit

    Wordoffset ByteoffsetIndexTag

    Decreasingassociativity,lowerway,moresets

    Fullyassociative(onlyoneset)Tagisallthebitsexceptblockandbyteoffset

    Directmapped(onlyoneway)Smallertags,onlyasinglecomparator

    Increasingassociativity,higherway,lesssets

    SelectsthesetUsedfortagcompare Selectsthewordintheblock

    10/16/17 Fall2017 - Lecture#15 39

    TotalCacheCapacity=Associativity× #ofsets× block_sizeBytes=blocks/set× sets× Bytes/block

    ByteOffsetTag Index

    C=N× S× B

    address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

    10/16/17 Fall2017 - Lecture#15 40

    TotalCacheCapacity=

    41

    Associativity*#ofsets*block_sizeBytes=blocks/set*sets*Bytes/block

    ByteOffsetTag Index

    C=N*S*B

    address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

    DoubletheAssociativity:Numberofsets?tag_size?index_size?#comparators?

    DoubletheSets:Associativity?tag_size?index_size?#comparators?

    10/16/17 Fall2017 - Lecture#15

    YourTurn• Foracacheof64blocks,eachblockfourbytesinsize:1. Thecapacityofthecacheis:____ bytes.2. Givena2-waySetAssociativeorganization,thereare___ sets,eachof__

    blocks,and__ placesablockfrommemorycouldbeplaced.3. Givena4-waySetAssociativeorganization,thereare____ setseachof__

    blocksand__ placesablockfrommemorycouldbeplaced.4. Givenan8-waySetAssociativeorganization,thereare____ setseachof__

    blocksand___ placesablockfrommemorycouldbeplaced.

    10/16/17 Fall2017 - Lecture#15 42

  • 10/16/17

    8

    PeerInstruction• ForSsets,Nways,Bblocks,whichstatementshold?

    (i)ThecachehasBtags(ii)ThecacheneedsNcomparators(iii)B=NxS(iv)SizeofIndex=Log2(S)

    A:(i)onlyB:(i)and(ii)onlyC:(i),(ii),(iii)only:Allfourstatementsaretrue

    10/16/17 Fall2017 - Lecture#15 43

    PeerInstruction• ForSsets,Nways,Bblocks,whichstatementshold?

    (i)ThecachehasBtags(ii)ThecacheneedsNcomparators(iii)B=NxS(iv)SizeofIndex=Log2(S)

    A:(i)onlyB:(i)and(ii)onlyC:(i),(ii),(iii)only:Allfourstatementsaretrue

    10/16/17 Fall2017 - Lecture#15 44

    CostsofSet-AssociativeCaches• N-wayset-associativecachecosts– Ncomparators(delayandarea)–MUXdelay(setselection)beforedataisavailable– Dataavailableaftersetselection(andHit/Missdecision).DM$:blockisavailablebeforetheHit/Missdecision• InSet-Associative,notpossibletojustassumeahitandcontinueandrecoverlaterifitwasamiss

    • Whenmissoccurs,whichway’sblockselectedforreplacement?– LeastRecentlyUsed(LRU):onethathasbeenunusedthelongest(principleoftemporallocality)• Musttrackwheneachway’sblockwasusedrelativetootherblocksintheset• For2-waySA$,onebitperset→setto1whenablockisreferenced;resettheotherway’sbit(i.e.,“lastused”)

    10/16/17 Fall2017 - Lecture#15 45

    CacheReplacementPolicies• RandomReplacement

    – Hardwarerandomlyselectsacacheevict• Least-RecentlyUsed

    – Hardwarekeepstrackofaccesshistory– Replacetheentrythathasnotbeenusedforthelongesttime– For2-wayset-associativecache,needonebitforLRUreplacement

    • ExampleofaSimple“Pseudo”LRUImplementation– Assume64FullyAssociativeentries– Hardwarereplacementpointerpointstoonecacheentry– Wheneveraccessismadetotheentrythepointerpointsto:

    • Movethepointertothenextentry– Otherwise:donotmovethepointer– (exampleof“not-most-recentlyused”replacementpolicy)

    46

    :

    Entry0Entry1

    Entry63

    ReplacementPointer

    10/16/17 Fall2017 - Lecture#15

    BenefitsofSet-AssociativeCaches• ChoiceofDM$versusSA$dependsonthecostofamissversusthecostof

    implementation

    • Largestgainsareingoingfromdirectmappedto2-way(20%+reductioninmissrate)

    10/16/17 Fall2017 - Lecture#15 47

    Outline

    • CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

    4810/16/17 Fall2017 – Lecture#15

  • 10/16/17

    9

    ChipPhotos

    4910/16/17 Fall2017- Lecture#15

    And inConclusion…

    • NameoftheGame:ReduceAMAT–ReduceHitTime–ReduceMissRate–ReduceMissPenalty

    • Balancecacheparameters(Capacity,associativity,blocksize)

    10/16/17 Fall2017 - Lecture#15 50

top related