cache - swarthmore collegecpu chip cache recall: how memory read works (1)cpu places address a on...

64
Cache 10/27/16

Upload: others

Post on 02-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Cache10/27/16

Page 2: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

TheMemoryHierarchy

Localsecondarystorage(disk)

LargerSlowerCheaperperbyte

Remotesecondarystorage(tapes,thecloud)

~100Mcyclestoaccess

OnChip

Storage

SmallerFasterCostlierperbyte

Mainmemory(DRAM)

~100cyclestoaccess

CPUinstrscan

directlyaccess

evenslowerthandisk

Registers1cycletoaccess

Cache(s)(SRAM)

~10’sofcyclestoaccess

FlashSSD/Localnetwork

Page 3: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

0.0

0.1

1.0

10.0

100.0

1,000.0

10,000.0

100,000.0

1,000,000.0

10,000,000.0

100,000,000.0

1980 1985 1990 1995 2000 2003 2005 2010

ns (1

0-9 s

ec)

Year

Disk seek timeFlash SSD access timeDRAM access timeSRAM access timeCPU cycle timeEffective CPU cycle time

3

DataAccessTimeoverYearsOvertime,gapwidensbetweenDRAM,disk,andCPUspeeds.

Disk

DRAM

CPU

SSD

SRAM

multicore

Reallywanttoavoidgoingtodiskfordata

WanttoavoidgoingtoMainMemoryfordata

Page 4: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Recall

• Acacheisasmaller,fastermemory,thatholdsasubsetofalarger(slower)memory

• Wetakeadvantageoflocality tokeepdataincacheasoftenaswecan!

• Whenaccessingmemory,wecheckcachetoseeifithasthedatawe’relookingfor.

Page 5: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Whycachemissesoccur

• Compulsory(cold-start)miss:• Firsttimeweusedata,loaditintocache.

• Capacitymiss:• Cacheistoosmalltostoreallthedatawe’reusing.

• Conflictmiss:• Tobringinnewdatatothecache,weevictedotherdatathatwe’restillusing.

Page 6: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Cachedesign

Questions:• Whatdatashouldbebroughtintothecache?• Whereinthecacheshoulditgo?• Whatdatashouldbeevictedfromthecache?

Goals:• Maximizehitrate.• Takeadvantageoftemporalandspatiallocality.• Minimizehardwarecomplexity.

Page 7: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

CachingTerminology• Block:thesizeofasinglecachedatastorageunit

• Datagetstransferredintocacheinentireblocks(nopartialblocks).• Lowerlevelsmayhavelargerblocksizes.

• Line:asinglecacheentry:• data(block)+identifyinginformation+otherstate

• Hit:thesoughtdataarefoundinthecache.• L1:typically~95%hitrate

• Miss:thesoughtdataarenotfoundinthecache.• Fetchfromlowerlevels.

• Replacement:Movingavalueoutofacachetomakeroomforanewvalueinitsplace

7

Blockissome#ofbytes

(fromcontiguousmem.addrs)

Page 8: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

CachebasicsLine metadata addressinfo datablock

0

1

2

3

… …

1021

1022

1023

Eachlinestoressomedata,plusinformationaboutwhatmemoryaddressthedatacamefrom.

Page 9: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

SupposetheCPUasksfordata,it’snotincache.Weneedtomoveinintocachefrommemory.Whereinthecacheshoulditbeallowedtogo?

A. Inexactlyoneplace.

B. Inafewplaces.

C. Inmostplaces,butnotall.

D. Anywhereinthecache.

ALURegs

Cache

MainMemory

MemoryBus

CPU

? ?

?

Page 10: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

A. Inexactlyoneplace.(“Direct-mapped”)• Everylocationinmemoryisdirectlymappedtooneplace

inthecache.Easytofinddata.

B. Inafewplaces.(“Setassociative”)• Amemorylocationcanbemappedto(2,4,8)locationsin

thecache.Middleground.

C. Inmostplaces,butnotall.

D. Anywhereinthecache.(“Fullyassociative”)• Norestrictionsonwherememorycanbeplacedinthe

cache.Fewerconflictmisses,moresearching.

Page 11: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Alargerblocksize(cachingmemoryinlargerchunks)islikelytoexhibit…A. Bettertemporallocality

B. Betterspatiallocality

C. Fewermisses(betterhitrate)

D. Moremisses(worsehitrate)

E. Morethanoneoftheabove.(Which?)

Page 12: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

BlockSizeImplications• Smallblocks

• Roomformoreblocks• Fewerconflictmisses

• Largeblocks• Fewertripstomemory• Longertransfertime• Fewercold-startmisses

MainMemory MainMemory

Cache Cache

ALURegs ALURegs

Page 13: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Trade-offs

• Thereisnosinglebestdesignforallpurposes!

• Commonsystemsquestion:whichpointinthedesignspaceshouldwechoose?

• Givenaparticularscenario:• Analyzeneeds• Choosedesignthatfitsthebill

Page 14: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

RealCPUs• Goals:generalpurposeprocessing

• balanceneedsofmanyusecases• middleoftheroad:jackofalltrades,masterofnone

• Someassociativity• 8-wayassociative(memoryinoneofeightplaces)

• Mediumsizeblocks• 16or32-byteblocks

Page 15: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Whatshouldweusetodeterminewhetherornotdataisinthecache?A. Thememoryaddressofthedata.

B. Thevalueofthedata.

C. Thesizeofthedata.

D. Someotheraspectofthedata.

Page 16: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Recall:HowMemoryReadWorks

(1) CPUplacesaddressAonthememorybus.

ALU

Registerfile

Businterface

0

Ax

MainmemoryI/Obridge

%eax

Loadoperation: movl (A), %eaxCPUchip

Cache

Page 17: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Recall:HowMemoryReadWorks

(1) CPUplacesaddressAonthememorybus.(2)Memorysendsbackthevalue

ALU

Registerfile

Businterface

0

Ax

MainmemoryI/Obridge

%eax

Loadoperation: movl (A), %eaxCPUchip

Cache

Page 18: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

MemoryAddressTellsUs…

• Istheblockcontainingthebyte(s)youwantalreadyinthecache?

• Ifnot,whereshouldweputthatblock?• Doweneedtokickout(“evict”)anotherblock?

• Whichbyte(s)withintheblockdoyouwant?

Page 19: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

MemoryAddresses

• Likeeverythingelse:seriesofbits(32or64)

• Keepinmind:• Nbitsgivesus2N uniquevalues.

• 32-bitaddress:• 10110001011100101101010001010110

Divideintoregions,eachwithdistinctmeaning.

Page 20: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

FirstDirect-Mapped

• Oneplacedatacanbe.

• Example:let’sassumesomeparameters:• 1024cachelocations(everyblockmappedtoone)• Blocksizeof8bytes

Page 21: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-MappedLine V D Tag Data(8Bytes)

0

1

2

3

4

… …

1020

1021

1022

1023

Metadata

Page 22: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

CacheMetadata

• Validbit:istheentryvalid?• Ifset:dataiscorrect,useitifwe‘hit’incache• Ifnot set:ignore‘hits’,thedataisgarbage

• Dirtybit:hasthedatabeenwritten?• Usedbywrite-backcaches• Ifset,needtoupdatememorybeforeeviction

Page 23: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-Mapped• Addressdivision:

• Identifybyteinblock• Howmanybits?

• Identifywhichrow(line)• Howmanybits?

Line V D Tag Data(8Bytes)

0

1

2

3

4

… …

1020

1021

1022

1023

Page 24: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-Mapped• Addressdivision:

• Identifybyteinblock• Howmanybits?3

• Identifywhichrow(line)• Howmanybits?10

Line V D Tag Data(8Bytes)

0

1

2

3

4

… …

1020

1021

1022

1023

Page 25: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-Mapped• Addressdivision: Line V D Tag Data(8Bytes)

0

1

2

3

4

… …

1020

1021

1022

1023

Index:Whichline(row)shouldwecheck?Wherecoulddatabe?

Tag(19bits) Index(10bits) Byteoffset (3bits)

Page 26: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-Mapped• Addressdivision: Line V D Tag Data(8Bytes)

0

1

2

3

4

… …

1020

1021

1022

1023

Index:Whichline(row)shouldwecheck?Wherecoulddatabe?

Tag(19bits) Index(10bits) Byteoffset (3bits)

4

Page 27: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-Mapped• Addressdivision: Line V D Tag Data(8Bytes)

0

1

2

3

4 1 4217

… …

1020

1021

1022

1023

Inparallel,check:

Tag:Doesthecacheholdthedatawe’relookingfor,orsomeotherblock?

Validbit:Ifentryisnotvalid,don’ttrustgarbageinthatline(row).

Tag(19bits) Index(10bits) Byteoffset (3bits)

4217 4

Iftagdoesn’tmatch,orlineisinvalid,it’samiss!

Page 28: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-Mapped• Addressdivision: Line V D Tag Data(8Bytes)

0

1

2

3

4 1 4217

… …

1020

1021

1022

1023

Byteoffsettellsuswhichsubsetofblocktoretrieve.

Tag(19bits) Index(10bits) Byteoffset (3bits)

4217 4

0 1 2 3 4 5 6 7

Page 29: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-Mapped• Addressdivision: Line V D Tag Data(8Bytes)

0

1

2

3

4 1 4217

… …

1020

1021

1022

1023

Byteoffsettellsuswhichsubsetofblocktoretrieve.

Tag(19bits) Index(10bits) Byteoffset (3bits)

4217 4 2

0 1 2 3 4 5 6 7

Page 30: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

V D Tag Data

=

Tag Index Byteoffset

0:miss1:hit

SelectByte(s)

Data

Input:MemoryAddress

Page 31: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-MappedExample• Supposeouraddressesare16bitslong.

• Ourcachehas16entries,blocksizeof16bytes• 4bitsinaddressfortheindex• 4bitsinaddressforbyteoffset• Remainingbits(8):tag

Page 32: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-MappedExample

• Let’ssayweaccessmemoryataddress:

• 0110101100110100

• Step1:• Partitionaddressintotag,index,offset

Line V D Tag Data(16Bytes)

0

1

2

3

4

5

15

Page 33: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-MappedExample

• Let’ssayweaccessmemoryataddress:

• 0110101100110100

• Step1:• Partitionaddressintotag,index,offset

Line V D Tag Data(16Bytes)

0

1

2

3

4

5

15

Page 34: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Direct-MappedExample

• Let’ssayweaccessmemoryataddress:

• 011010110011 0100

• Step2:• Useindextofindline(row)

• 0011->3

Line V D Tag Data(16Bytes)

0

1

2

3

4

5

15

Page 35: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Line V D Tag Data(16Bytes)

0

1

2

3

4

5

15

Direct-MappedExample

• Let’ssayweaccessmemoryataddress:

• 011010110011 0100

• Step2:• Useindextofindline(row)

• 0011->3

Page 36: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Line V D Tag Data(16Bytes)

0

1

2

3

4

5

15

Direct-MappedExample

• Let’ssayweaccessmemoryataddress:

• 011010110011 0100

• Note:• ANYaddresswith0011(3)asthemiddlefourindexbitswillmaptothiscacheline.

• e.g.111111110011 0000

So,whichdataishere?

Datafromaddress0110101100110100OR1111111100110000?

Usetagtostorehigh-orderbits.Let’susdeterminewhichdataishere!(manyaddressesmaphere)

Page 37: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Line V D Tag Data(16Bytes)

0

1

2

3 01101011

4

5

15

Direct-MappedExample

• Let’ssayweaccessmemoryataddress:

• 011010110011 0100

• Step3:• Checkthetag• Isit01101011(hit)?• Somethingelse(miss)?• (Mustalsoensurevalid)

Page 38: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Eviction

• Ifwedon’tfindwhatwe’relookingfor(miss),weneedtobringinthedatafrommemory.

• Makeroombykickingsomethingout.• Iflinetobeevictedisdirty,writeittomemoryfirst.

• Anotherimportantsystemsdistinction:• Mechanism:Anabilityorfeatureofthesystem.Whatyoucan do.

• Policy:Governsthedecisionsmakingforusingthemechanism.Whatyoushould do.

Page 39: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Evictionfordirect-mappedcache

• Mechanism:overwritebitsincacheline,updating• Validbit• Tag• Data

• Policy:notmanyoptionsfordirect-mapped• Overwriteattheonlylocationitcouldbe!

Page 40: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Eviction:Direct-Mapped• Addressdivision: Line V D Tag Data(8Bytes)

0

1

2

3

4

… …

1020 1 0 1323 57883

1021

1022

1023

Findline:

Tagdoesn’tmatch,bringinfrommemory.

Ifdirty,writebackfirst!

Tag(19bits) Index(10bits) Byteoffset (3bits)

3941 1020

Page 41: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Eviction:Direct-Mapped• Addressdivision: Line V D Tag Data(8Bytes)

0

1

2

3

4

… …

1020 1 0 1323 57883

1021

1022

1023

Tag(19bits) Index(10bits) Byteoffset (3bits)

3941 1020

MainMemory

1.Sendaddresstoreadmainmemory.

Page 42: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Eviction:Direct-Mapped• Addressdivision: Line V D Tag Data(8Bytes)

0

1

2

3

4

… …

1020 1 0 3941 92

1021

1022

1023

Tag(19bits) Index(10bits) Byteoffset (3bits)

3941 1020

MainMemory

1.Sendaddresstoreadmainmemory.

2.Copydatafrommemory.Updatetag.

Page 43: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Supposewehad8-bitaddresses,acachewith8lines,andablocksizeof4bytes.

• Howmanybitswouldweusefor:• Tag?• Index?• Offset?

Page 44: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Howmanyoftheseoperationschangethecache?Howmanyaccessmemory?

Read01000100(Value:5)Read11100010(Value:17)Write01110000(Value:7)Read10101010(Value:12)Write01101100(Value:2)

Line V D Tag Data(4Bytes)

01 0 111 17

11 0 011 9

20 0 101 15

31 1 001 8

41 0 011 4

50 0 111 6

60 0 101 32

71 0 110 3

A. 1B. 2C. 3

D. 4E. 5

Page 45: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Steppingthrough…

Read01000100(Value:5)Read11100010(Value:17)Write01110000(Value:7)Read10101010(Value:12)Write01101100(Value:2)

Line V D Tag Data(4Bytes)

01 0 111 17

11 0 011 010 9 5

20 0 101 15

31 1 001 8

41 0 011 4

50 0 111 6

60 0 101 32

71 0 110 3

Page 46: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Steppingthrough…

Read01000100(Value:5)Read11100010(Value:17)Write01110000(Value:7)Read10101010(Value:12)Write01101100(Value:2)

Line V D Tag Data(4Bytes)

01 0 111 17

11 0 011 010 9 5

20 0 101 15

31 1 001 8

41 0 011 4

50 0 111 6

60 0 101 32

71 0 110 3Nochangenecessary.

Page 47: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Steppingthrough…

Read01000100(Value:5)Read11100010(Value:17)Write01110000(Value:7)Read10101010(Value:12)Write01101100(Value:2)

Line V D Tag Data(4Bytes)

01 0 111 17

11 0 011 010 9 5

20 0 101 15

31 1 001 8

41 0

1011 4 7

50 0 111 6

60 0 101 32

71 0 110 3

Page 48: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Steppingthrough…

Read01000100(Value:5)Read11100010(Value:17)Write01110000(Value:7)Read10101010(Value:12)Write01101100(Value:2)

Line V D Tag Data(4Bytes)

01 0 111 17

11 0 011 010 9 5

201

0 101 101 15 12

31 1 001 8

41 0

1011 4 7

50 0 111 6

60 0 101 32

71 0 110 3

Note:taghappenedtomatch,butlinewasinvalid.

Page 49: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Steppingthrough…

Read01000100(Value:5)Read11100010(Value:17)Write01110000(Value:7)Read10101010(Value:12)Write01101100(Value:2)

Line V D Tag Data(4Bytes)

01 0 111 17

11 0 011 010 9 5

201

0 101 101 15 12

31 1

1001 011 8 2

41 0

1011 4 7

50 0 111 6

60 0 101 32

71 0 110 3

1. Writedirtylinetomemory.2. Loadnewvalue,setitto2,

markitdirty(write).

Page 50: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Question…

Whenmightdirect-mappedcachebeabadidea?

Whentwoblocksweusealothavethesameindex.

Page 51: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Theotherextreme:fullyassociative

+Anyblockcangoinanycacheline.+Reducescachemisses.

- Havetocheckeverylineformatchingaddress.- Needtostoremorebitsoftheaddress.- Evictiondecisionsareharder.

Page 52: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Compromise:setassociative

• EachlinecanholdNblocks.• Addressesaremappedtoaline,butcangoinanyofthatline’sNblocks.

Page 53: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Comparison:1024Lines(Forthesamecachesize,inbytesofdata.)

Direct-mapped

1024indices(10bits)

2-waysetassociative

512sets(9bits)Tagis1bitlarger.V D Tag Data(8Bytes)

Set # V D Tag Data(8Bytes)

0

1

2

3

4

… …

508

509

510

511

Page 54: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

2-WaySetAssociative

V D Tag Data(8Bytes)

1 0 3941

Set # V D Tag Data(8Bytes)

0

1

2

3

4 1 1 4063

… …

508

509

510

511

Tag(20bits) Set(9bits) Byteoffset (3bits)

3941 4Samecapacityaspreviousexample:1024rowswith1entryvs.512rowswith2entries

Page 55: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

2-WaySetAssociative

V D Tag Data(8Bytes)

1 0 3941

Set # V D Tag Data(8Bytes)

0

1

2

3

4 1 1 4063

… …

508

509

510

511

Tag(20bits) Set(9bits) Byteoffset (3bits)

3941 4

Checkalllocationsintheset,inparallel.

Page 56: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

2-WaySetAssociative

V D Tag Data(8Bytes)

1 0 3941

Set # V D Tag Data(8Bytes)

0

1

2

3

4 1 1 4063

… …

508

509

510

511

Tag(20bits) Set(9bits) Byteoffset (3bits)

3941 4

0 1 2 3 4 5 6 70 1 2 3 4 5 6 7

Multiplexer Selectcorrectvalue.

Page 57: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

4-WaySetAssociativeCache

Clearly,morecomplexityhere!

Page 58: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Eviction• Mechanismisthesame…

• Overwritebitsincacheline:updatetag,valid,data

• Policy:choosewhichlineinthesettoevict• Option1:Pickarandomlineinset• Option2:Chooseaninvalidlinefirst• Option3:Choosetheleastrecentlyusedblock

• Hasexhibitedtheleastlocality,kickitout!• Option4:first2then3

Page 59: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

LeastRecentlyUsed(LRU)

• Intuition:ifithasn’tbeenusedinawhile,wehavenoreasontobelieveitwillbeusedsoon.

• NeedextrastatetokeeptrackofLRUinfo.

V D Tag Data(8Bytes)

1 0 3941

Set # LRU V D Tag Data(8Bytes)

0 0

1 1

2 1

3 0

4 1 1 1 4063

… …

Page 60: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

LeastRecentlyUsed(LRU)• Intuition:ifithasn’tbeenusedinawhile,wehavenoreasontobelieveitwillbeusedsoon.

• NeedextrastatetokeeptrackofLRUinfo.

• ForperfectLRUinfo:• 2-way:1bit• 4-way:8bits• N-way:N*log2 Nbits

Anotherreasonwhyassociativityoftenmaxesoutat8or16.

Thesearemetadatabits,not“useful”programdatastorage.

(Approximationsmakeitnotquiteasbad.)

Page 61: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Howwouldthecachechangeifweperformedthefollowingmemoryoperations?(2-wayset)Read01000100(Value:5)Read11100010(Value:17)Write01100100(Value:7)Read01000110(Value:5)Write01100000(Value:2)

V D Tag Data(4Bytes)

1 0 001 17

1 0 010 5

… …

Set # LRU V D Tag Data(4Bytes)

0 1 0 0 111 4

1 0 1 1 111 9

2 … …

3

4

5

6

7

LRUof0meanstheleftlineinthesetwasleastrecentlyused.1meanstherightlinewasusedleastrecently.

Page 62: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

CacheConsciousProgramming• Knowingaboutcachinganddesigningcodearounditcansignificantlyeffectperformance(ex)2Darrayaccesses

Algorithmically,bothO(N*M).

Isonefasterthantheother?

for(i=0; i < N; i++) {for(j=0; j< M; j++) {

sum += arr[i][j];}}

for(j=0; j < M; j++) {for(i=0; i< N; i++) {

sum += arr[i][j];}}

A.isfaster. B.isfaster.

C.Bothwouldexhibitroughlyequalperformance.

Page 63: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

CacheConsciousProgrammingThefirstnestedloopismoreefficientifthecacheblocksizeislargerthanasinglearraybucket(forarraysofbasicCtypes,itwillbe).

(ex)1missevery4bucketsvs.1misseverybucket

for(i=0; i < N; i++) {for(j=0; j< M; j++) {

sum += arr[i][j];}}

for(j=0; j < M; j++) {for(i=0; i< N; i++) {

sum += arr[i][j];}}

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

...

.

.

.

1 . ..

2

3

4

.

.

.

Page 64: Cache - Swarthmore CollegeCPU chip Cache Recall: How Memory Read Works (1)CPU places address A on the memory bus. (2)Memory sends back the value ALU Register file Bus interface 0 x

Acaveat:Amdahl’sLawIdea:anoptimizationcanimprovetotalruntimeatmostbythefractionitcontributestototalruntime

Ifprogramtakes100secs torun,andyouoptimizeaportionofthecodethataccountsfor2%oftheruntime,thebestyouroptimizationcandoisimprovetheruntimeby2secs.

Amdahl’sLawtellsustofocusouroptimizationeffortsonthecodethatmatters:

Speed-upwhatisaccountingforthelargestportionofruntime togetthelargestbenefit.And,don’twastetimeonthesmallstuff.

“Prematureoptimizationistherootofallevil.”–DonaldKnuth