eecs 470 lecture 15
TRANSCRIPT
![Page 1: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/1.jpg)
Lecture 13 Slide 1 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
EECS470Lecture15BasicCaches
Winter2022
Prof.RonaldDreslinski
h6p://www.eecs.umich.edu/courses/eecs470
Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.
![Page 2: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/2.jpg)
Lecture 13 Slide 2 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Readings ForToday:
❒ H&P2.1
ForThursday:❒ H&P2.2,2.3,B.3❒ N.Jouppi.Improvingdirect-mappedcacheperformance…
![Page 3: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/3.jpg)
Lecture 13 Slide 3 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Announcements MidtermGradesreleased.
Ifyouaremorethan2Std.Dev.fromthemean,pleaseemailmetosetupaJmetochat.
LookforHW4tobereleasedtomorrowsomeJme
![Page 4: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/4.jpg)
Lecture 13 Slide 4 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Staff Midterm Outcome Lot’sofsmallsuggesJons,hereisalistofacJonableoneswewilltrytoaddress:
1) Fixthewebsite/calendar2) MoreGSI’s(lessacJonablethissemester)3) Grade’sbacksooner4) Officehoursqueueslong
![Page 5: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/5.jpg)
Lecture 12 Slide 5 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Wide Fetch - Non-sequential TworelatedquesJons
q Howmanybranchespredictedpercycle?q CanwefetchfrommulJpletakenbranchespercycle?
Simplest,mostcommonorganizaJon:“1”and“No”q OnepredicJon,discardpost-branchinsnsifpredicJonis“Taken”– LowerseffecJvefetchwidthandIPCq AveragenumberofinstrucJonspertakenbranch?
q Assume:20%branches,50%taken→~10instrucJonsq Considera10-instrucJonloopbodywithan8-issueprocessor
q Withoutsmarterfetch,ILPislimitedto5(not8)
Compilercanhelpq Unrollloops,reducetakenbranchfrequency
![Page 6: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/6.jpg)
Lecture 12 Slide 6 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Multiple Branch Predictions IssueswithmulJplebranchpredicJons:
q LatencyresulJngfromsequenJalpredicJonsq LaterpredicJonsbasedonstale/speculaJvehistoryq Don’tforget,0.95x0.95x0.95=0.85
BTB
BTB
BTB
Fetch address
Block 1 Block 2 Block 3
![Page 7: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/7.jpg)
Lecture 12 Slide 7 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Examples of Multi-Branch Predictors
bn b0 BHR
PHT
p0 p1 p2
How do you update this thing after a branch resolves?
![Page 8: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/8.jpg)
Lecture 12 Slide 8 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Examples of Multi-Branch Predictors
bn b0 BHR
bn:2 bn-1:1
bn-2:0
b1 b0
p0
b0 p0
p0 p1
p1 p2
PHT
2n-2 x 4 entries
![Page 9: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/9.jpg)
Lecture 12 Slide 9 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Multiple Predicted Taken Branches
IssueswithmulJpletakenbranches:q LonglatencywithmulJplesequenJalI-cacheaccessesq or,mulJ-portedI-cachewithsloweraccesslatencyq or,mulJ-bankedI-cachetoapproximatemulJ-port
Block 2 FA
Block 1 FA
Block 3 FA
Block 1 instructions
Block 2 instructions
Block 3 instructions
Multi-ported I-cache
![Page 10: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/10.jpg)
Lecture 12 Slide 10 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Instruction Alignment and Collapsing
Issueswithalignmentandcollapsing:q Misalignmentbetweenfetchgroupandcacheline.q Packingofvariable-sizedblocksintofetchbuffer.
I-cache Port 1
I-cache Port 2
I-cache Port 3
Fetch buffer
![Page 11: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/11.jpg)
Lecture 13 Slide 11 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Memory Systems: Basic Caches
![Page 12: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/12.jpg)
Lecture 13 Slide 12 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Memory Systems
Basiccaches❒ introducJon❒ fundamentalquesJons❒ cachesize,blocksize,associaJvity
Advancedcaches
Mainmemory
Virtualmemory
Start today
![Page 13: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/13.jpg)
Lecture 13 Slide 13 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Motivation
Wantmemorytoappear:❒ asfastasCPU❒ aslargeasrequiredbyalloftherunningapplicaJons
1
10
100
1000
10000
1985 1990 1995 2000 2005 2010
Perf
orm
ance
Processor
Memory
![Page 14: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/14.jpg)
Lecture 13 Slide 14 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
LargerFaster
Memory Hierarchy Makecommoncasefast:
❒ common:temporal&spaJallocality❒ fast:smallermoreexpensivememory
Registers
Caches
Memory
Disk (MEMS?)
![Page 15: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/15.jpg)
Lecture 13 Slide 15 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Storage Hierarchies Storagesarelayeredbyhierarchiesinorderof
❒ increasinglatency(ti) ti<ti+1❒ increasingsize(si)
⇒decreaseunitcost(ci) si<si+1,ci>ci+1❒ decreasingbandwidth(bi) bi>bi+1❒ increasingxferunit(xi) xi<xi+1
Level0Registers
Level1(nlevelsof)Caches
Level2MainMemory(PrimaryStorage)
Level3Disks(SecondaryStorage)
Level4TapeBackup(TerJaryStorage)
ISA feature Memory Abstractions
Level 2.5: Flash?
Level 1.5: NVRAM?
![Page 16: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/16.jpg)
Lecture 13 Slide 16 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Processor/Memory Boundaries
I-Unit E-Unit
L1 I-Cache L1 D-Cache
L2 Cache (SRAM on-chip)
D-TLB I-TLB
Regs
Main Memory (DRAM)
Processor
L3 Cache (SRAM off-chip)
![Page 17: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/17.jpg)
Lecture 13 Slide 17 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Caches AnautomaJcallymanagedhierarchy
“Ahidingplace,esp.ofgoods,treasure,etc.”--OED
Keeprecentlyaccessedblock❒ temporallocality
Breakmemoryintoblocks(severalbytes)andtransferdatato/fromcacheinblocks
❒ spaJallocality
AlotofarchitecturesoptforsoFwaremanagedscratch-padmemoryinsteade.g.Cray-1,embeddedprocessors,Why??
CPU
$
Memory
![Page 18: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/18.jpg)
Lecture 13 Slide 18 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cache (Abstractly) Keeprecentlyaccessedblockin“blockframe”
❒ state(e.g.,valid)❒ addresstag❒ data
address state
bookkeepingoverhead
data
mulJplebytesperblockframetoamorJzeoverhead
![Page 19: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/19.jpg)
Lecture 13 Slide 19 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cache (Abstractly) Onmemoryread
ifincomingaddresscorrespondstooneofthestoredaddresstagthen❍ HIT❍ returndata
else❍ MISS❍ choose&displaceacurrentblockinuse❍ fetchnew(referenced)blockfrommemoryintoframe❍ returndata
- Whereandhowtolookforablock?(Blockplacement)- Whichblockisreplacedonamiss?(Blockreplacement)- Whathappensonawrite?Writestrategy(Later)- Whatiskept?(Bookkeeping,data)
![Page 20: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/20.jpg)
Lecture 13 Slide 20 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Terminology block(cacheline)—minimumunitthatmaybepresent
hit—blockisfoundinthecache
miss—blockisnotfoundinthecache
missraJo—fracJonofreferencesthatmiss
hitJme—Jmetoaccessthecache
misspenalty❒ Jmetoreplaceblockinthecache+delivertoupperlevel❒ accessJme—Jmetogetfirstword❒ transferJme—Jmeforremainingwords
![Page 21: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/21.jpg)
Lecture 13 Slide 21 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cache Performance Assume
❒ CacheaccessJmeisequalto1cycle❒ CachemissraJois0.01❒ Cachemisspenaltyis20cycles
MeanaccessJme
=CacheaccessJme+missraJo*misspenalty
=1+0.01*20=1.2
Typically❒ level-1is16K-64K,level-2is512K-4M,memoryis128M-4G❒ level-1asfastastheprocessor(increasingly2-cycles)❒ level-1is1/10000capacitybutcontains98%ofreferences
MemoizaSon&amorSzaSon
![Page 22: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/22.jpg)
Lecture 13 Slide 22 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Fundamental Cache Parameters that affects miss rate
Cachesize (C)
Blocksize (b)
CacheassociaJvity (a)
![Page 23: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/23.jpg)
Lecture 13 Slide 23 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cache Size Cachesizeisthetotaldata(notincludingtag)capacity
❒ biggercanexploittemporallocalitybeter❒ notALWAYSbeter
Toolargeacache❒ smallerisfaster=>biggerisslower❒ accessJmemaydegradecriJcalpath
Toosmallacache❒ don’texploittemporallocalitywell❒ usefuldataconstantlyreplaced
hit rate
C
“working set” size
holding b and a constant
![Page 24: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/24.jpg)
Lecture 13 Slide 24 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Block Size Blocksizeisthedatathatis
❒ associatedwithanaddresstag❒ notnecessarilytheunitoftransferbetweenhierarchies(sub-blocking)
Toosmallblocks❒ don’texploitspaJallocalitywell❒ haveinordinatetagoverhead
Toolargeblocks❒ uselessdatatransferred❒ usefuldatapermanentlyreplaced—toofewtotal#blocks
b holding C and a constant
![Page 25: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/25.jpg)
Lecture 13 Slide 25 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Associativity
Fully-associaJveblockgoesinanyframe
(thinkallframesin1set)
Direct-mappedblockgoesinexactly
oneframe
(think1frameperset)
Set-associaJveablockgoesinany
frameinexactlyoneset
(framesgroupedintosets)
Wheredoesblock12(b’1100)go?
0123
01234567
01010101
01234567
BlockSet/BlockSet
![Page 26: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/26.jpg)
Lecture 13 Slide 26 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Impact of Associativity TypicalvaluesforassociaJvity
❒ 1,2-,4-,8-wayassociaJve
LargerassociaJvity❒ lowermissrate,lessvariaJonamongprograms
❒ onlyimportantforsmall“C/b”
SmallerassociaJvity❒ lowercost,fasterhitJme
hit rate
a
~5
holding C and b constant
![Page 27: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/27.jpg)
Lecture 13 Slide 27 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Direct Mapped Caches
tag idx b.o.
= Tag
match
(hit?)
Multiplexor de
code
r
= Tag
Match
(hit?)
deco
der
tag index
block index
Don’t forget to check the valid/state bits
![Page 28: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/28.jpg)
Lecture 13 Slide 28 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
tag blk.offset
Fully Associative Cache
= = =
= Multiplexor
Associative Search
Tag
![Page 29: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/29.jpg)
Lecture 13 Slide 29 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
N-Way Set Associative Cache
tag idx b.o.
= Tag match
deco
der
= Tag match
Multiplexor
deco
der
a set a way (bank)
Cache Size = N x 2B+b
![Page 30: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/30.jpg)
Lecture 13 Slide 30 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Associative Block Replacement Whichblockinasettoreplaceonamiss?Ideally—Belady’salgorithm,replacetheblockthat“will”beaccessedthefurthestinthefuture
❒ Howdoyouimplementit?
ApproximaJons:Leastrecentlyused—LRU
❒ opJmized(assume)fortemporallocality (expensiveformorethan2-way)
Notmostrecentlyused—NMRU❒ trackMRU,randomselectfromothers,goodcompromise
Random❒ nearlyasgoodasLRU,simpler(usuallypseudo-random)
HowmuchcanblockreplacementpolicymaUer?
![Page 31: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/31.jpg)
Lecture 13 Slide 31 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Example: a=2, C=1kB, b=4B, word-size=2B Basic Solution
data 0
128-lines x
4-bytes
data 1
128-lines x
4-bytes
tag0
128-l x
23-b
v0 “ x
1-b
tag1
128-l x
23-b
v1 “ x
1-b
tag PA[31:9]
PA[0]
b.o. PA[1]
idx PA[8:2]
7
idx 7
idx 7
idx 7
idx
= tag
23
hit0
=
hit1
2-1-mux 2-1-mux b.o.
2-1-muxd hit0 hit1
HIT DATA
hit0
hi
t1
16
![Page 32: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/32.jpg)
Lecture 13 Slide 32 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Write Policies WritesaremoreinteresJng
❒ onreads,datacanbeaccessedinparallelwithtagcompare❒ onwrites,needstwosteps❒ isturn-aroundJmeimportantforwrites? cacheopSmizaSonoFendeferwritesforreads
ChoicesofWritePolicies❒ Onwritehits,updatememory?
❍ Yes:write-through+nocoherenceissue,+immediateobservability,-morebandwidth
❍ No:write-back❒ Onwritemisses,allocateacacheblockframe?
❍ Yes:write-allocate❍ No:no-write-allocate
![Page 33: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/33.jpg)
Lecture 13 Slide 33 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Write Policies (Cont.) Write-through
❒ updatememoryoneachwrite❒ keepsmemoryup-to-date❒ traffic/reference=fwrites,e.g.0.20 independentofcacheperformance(missrate)
Write-back❒ updatememoryonlyonblockreplacement❒ manycachelinesareonlyreadandneverwritento❒ add“dirty”bittostatusword
❍ originallyclearedawerreplacement❍ setwhenablockframeiswritento❍ onlywritebackadirtyblock,and“drop”cleanblocksw/omemoryupdate
❒ traffic/reference=fdirtyxmissxB❍ e.g.,traffic/reference=1/2x0.05x4=0.1
![Page 34: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/34.jpg)
Lecture 13 Slide 34 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Store Buffers
BufferCPUwrites❒ allowsreadstoproceed❒ stallonlywhenfull❒ datadependence?
❍ Whathappensondependentloads/stores?
CPU $
![Page 35: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/35.jpg)
Lecture 13 Slide 35 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Writeback Buffers
Betweenwrite-backcacheandnextlevel1.Movereplaced,dirtyblockstobuffer2.Readnewline3.Movereplaceddatatomemory
Usuallyonlyneed1or2write-backbufferentries
$ $$/Memory
![Page 36: EECS 470 Lecture 15](https://reader033.vdocument.in/reader033/viewer/2022052805/628f3eebabf43578f840043d/html5/thumbnails/36.jpg)
Lecture 13 Slide 36 EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
“Harvard” vs. “Princeton” Unified(someSmesknownasPrinceton)
❒ lesscostly,dynamicresponse,handleswritestoinstrucJons
SplitIandD(someSmesknownasHarvard)❒ mostoftheJmecodeanddatadon’tmix❒ 2xbandwidth,placeclosetoI/Dports❒ cancustomizesize(I-footprintgenerallysmallerthand-footprint),nointerferencebetweenI/D
❒ self-modifyingcodecancause“coherence”problems
CachesshouldbesplitforfrequentsimultaneousI&Daccess❒ nolongeraquesJonin“high-performance”on-chipL-1caches