but... that’s three wishes in one!!!
DESCRIPTION
Mommy, mommy! I want’ a hardware cache with few conflicts and low power consumption that is easy to implement!. But... That’s three wishes in one!!!. Refinement and Evaluation of the Elbow Cache or The Little Cache that could. Mathias Spjuth. Cache. Address Space. {. {. Sets. {. {. - PowerPoint PPT PresentationTRANSCRIPT
Uppsala Architecture Research Team
But... That’s three
wishes in one!!!
Mommy, mommy!I want’ a hardware
cache with few conflicts and low
power consumption that is easy to implement!
Uppsala Architecture Research Team
Refinement and Refinement and Evaluation of theEvaluation of the
Elbow Elbow CacheCache
oror
The Little Cache that The Little Cache that couldcould
Mathias SpjuthMathias Spjuth
Uppsala Architecture Research Team
CacheCacheAddress SpaceAddress Space
BB
H
C
Memory References: Memory References: AAMemory References: Memory References: A-BA-BMemory References: Memory References: A-B-CA-B-C
2-way Set Associative 2-way Set Associative CacheCache
Memory References:Memory References:
AA
DDFF
EE
HH
A
F
ED
B
G
Memory References: Memory References: A-B-C-DA-B-C-DMemory References: Memory References: A-B-C-D-EA-B-C-D-EMemory References: Memory References: A-B-C-D-E-FA-B-C-D-E-FMemory References: Memory References: A-B-C-D-E-F-GA-B-C-D-E-F-GMemory References: Memory References: A-B-C-D-E-F-G-HA-B-C-D-E-F-G-H
CC
GG
{{{{
{{{{Se
tsSets
Uppsala Architecture Research Team
Conflicts (cont.)Conflicts (cont.)
Traditional way of reducing conflicts Traditional way of reducing conflicts is to use is to use set associativeset associative caches. caches.
++ Lower miss rate (than direct-++ Lower miss rate (than direct-mapped)mapped)
-- Slower access-- Slower access
-- More complexity (uses more chip--- More complexity (uses more chip-area)area)
-- Higher power consumption -- Higher power consumption
Uppsala Architecture Research Team
Address SpaceAddress Space
Cache Bank 1Cache Bank 1 BB
H
C
Memory References: Memory References: AAMemory References: Memory References: A-BA-BMemory References: Memory References: A-B-CA-B-CMemory References:Memory References:
AA
FF
EE
HH
A
F
E DB
G
Memory References: Memory References: A-B-C-DA-B-C-DMemory References: Memory References: A-B-C-D-EA-B-C-D-EMemory References: Memory References: A-B-C-D-E-FA-B-C-D-E-FMemory References: Memory References: A-B-C-D-E-F-GA-B-C-D-E-F-GMemory References: Memory References: A-B-C-D-E-F-G-HA-B-C-D-E-F-G-H
CC
GG
Cache Bank 2Cache Bank 2
2-way 2-way
SkewedSkewed
AssociativeAssociative
CacheCacheDD
Uppsala Architecture Research Team
Address SpaceAddress Space
Cache Bank 1Cache Bank 1 BB
H
C
Memory References: Memory References: AAMemory References: Memory References: A-BA-BMemory References: Memory References: A-B-CA-B-CMemory References:Memory References:
AA
FF
EE
HH
A
F
E DB
G
Memory References: Memory References: A-B-C-DA-B-C-DMemory References: Memory References: A-B-C-D-EA-B-C-D-EMemory References: Memory References: A-B-C-D-E-FA-B-C-D-E-FMemory References: Memory References: A-B-C-D-E-F-GA-B-C-D-E-F-GMemory References: Memory References: A-B-C-D-E-F-G-HA-B-C-D-E-F-G-H
CC
GG
Cache Bank 2Cache Bank 2
2-way 2-way
SkewedSkewed
AssociativeAssociative
CacheCacheDD
HHNo No
Conflicts!Conflicts!
Uppsala Architecture Research Team
Skewed associative Skewed associative cachescaches
Uses Uses differentdifferent hashing (skewing) hashing (skewing) functions for indexing each cache functions for indexing each cache bankbank
++ Lower missrate (than set-assoc.)++ Lower missrate (than set-assoc.)++ More predictable++ More predictable -- Slightly slower (hashing)-- Slightly slower (hashing) -- ”Cannot” use LRU replacement-- ”Cannot” use LRU replacement -- ”Cannot” use VI-PT -- ”Cannot” use VI-PT
Uppsala Architecture Research Team
Elbow CacheElbow Cache
Improve the performance of a Improve the performance of a skewed associative cache by skewed associative cache by reallocatingreallocating blocks within the blocks within the cache.cache.
By doing so we get a broader choice By doing so we get a broader choice of which block to choose as the of which block to choose as the victim.victim.
Use timestamps as replacement Use timestamps as replacement metric.metric.
Uppsala Architecture Research Team
Finding the victimFinding the victim
Two methods:Two methods:
1.1. Look-aheadLook-aheadConsider all possible placements Consider all possible placements beforebefore the first reallocation is the first reallocation is made.made.
2.2. FeedbackFeedbackOnly consider the immediate Only consider the immediate placements, then iterate.placements, then iterate.
Uppsala Architecture Research Team
Address SpaceAddress Space
Cache Bank 1Cache Bank 1 BB
HC
Memory References: Memory References: AAMemory References: Memory References: A-BA-BMemory References: Memory References: A-B-CA-B-CMemory References:Memory References:
FF
EE
HH
A
F
E
D
B
G
Memory References: Memory References: A-B-C-DA-B-C-DMemory References: Memory References: A-B-C-D-EA-B-C-D-EMemory References: Memory References: A-B-C-D-E-FA-B-C-D-E-FMemory References: Memory References: A-B-C-D-E-F-GA-B-C-D-E-F-GMemory References: Memory References: A-B-C-D-E-F-G-H-XA-B-C-D-E-F-G-H-X
CC
GG
Cache Bank 2Cache Bank 2
2-way 2-way
ElbowElbow
LookaheadLookahead
CacheCacheDD
XX
AA
Replacement paths:Replacement paths:
F-B-AF-B-A
E-D-HE-D-H
X
Uppsala Architecture Research Team
Address SpaceAddress Space
Cache Bank 1Cache Bank 1 BB
HC
Memory References: Memory References: AAMemory References: Memory References: A-BA-BMemory References: Memory References: A-B-CA-B-CMemory References:Memory References:
FF
HH
A
F
E
D
B
G
Memory References: Memory References: A-B-C-DA-B-C-DMemory References: Memory References: A-B-C-D-EA-B-C-D-EMemory References: Memory References: A-B-C-D-E-FA-B-C-D-E-FMemory References: Memory References: A-B-C-D-E-F-GA-B-C-D-E-F-GMemory References: Memory References: A-B-C-D-E-F-G-H-XA-B-C-D-E-F-G-H-X
CC
GG
Cache Bank 2Cache Bank 2
2-way 2-way
ElbowElbow
FeedbackFeedback
CacheCacheXX
AA
X
Temp. RegisterTemp. Register
EEDD
Uppsala Architecture Research Team
Finding the victim Finding the victim (cont.)(cont.)
Look-ahead:Look-ahead:++ Most optimal++ Most optimal -- Difficult to implement -- Difficult to implement (>1 (>1
transformation)transformation)
Feedback:Feedback:++ Easy to implement ++ Easy to implement
(feed victim back to write buffer)(feed victim back to write buffer)
-- Needs extra space in the write -- Needs extra space in the write bufferbuffer
Uppsala Architecture Research Team
Replacement MetricsReplacement Metrics
Enhanced-Not-Recently-Used (NRUE):
The best policy for skewed caches known so far.
Each block contains two extra bits, a recently-used and very-recently-used bit, that are set on access to the block.
These bits are regularly cleared. The very-recently-used bit is cleared more often.
First, try to find a victim with no bit set.
Then one with only the recently-used bit set.
Then use random replacement.
Uppsala Architecture Research Team
TimestampsTimestamps
1010010100
100000100000A TA
B TBTTcurrcurr
DataData TimestampTimestamp
CounterCounter
Increase counter Increase counter on every cache on every cache allocationallocation
Dist(A)=Tmax– Tcurr + TA if Tcurr < TA
Tcurr – TA if Tcurr >= TA{
1010010100
1000011000011010010100
100010100010
Uppsala Architecture Research Team
TimestampsTimestamps
TimestampTimestamp
[ticks][ticks]TTmaxmax00
TTcurrcurrTTAATTcurrcurr TTBB
Dist(A) > Dist(B); A older than BDist(A) > Dist(B); A older than B
TTAA
Dist(A) < Dist(B); B older than ADist(A) < Dist(B); B older than A
Uppsala Architecture Research Team
ImplementationImplementation
Lookahead:Lookahead: At most one transformation At most one transformation
(4 possible victims) each (4 possible victims) each replacement.replacement.
Do the transformation and Do the transformation and load the new data at the load the new data at the same time.same time.
Uppsala Architecture Research Team
ImplementationImplementation
Feedback:Feedback: Up to 7 transformations (max. 8 Up to 7 transformations (max. 8
possible victims) each possible victims) each replacement.replacement.
Temporary victims are moved to Temporary victims are moved to the write buffer, before the write buffer, before reallocation.reallocation.
Extra control field in write buffer.Extra control field in write buffer.
Uppsala Architecture Research Team
FeedbackFeedback
N
2:12:1
2:2
Y
X
X
Bank I Bank II
Write Buffer
Xid1 Xid2b Step Data+Tag TmSt
Data+Tag TmSt
BTmSt
ATmSt
≥1
Wri
teR
ea
d
CTmSt
v
b
s
writemem
readmem
i j
k
&
Data+Tag TmSt
Uppsala Architecture Research Team
Test ConfigurationsTest Configurations Set associative: 2-way, 4-way, 8-way, 16-waySet associative: 2-way, 4-way, 8-way, 16-way Fully associative cacheFully associative cache Skewed associative, LRUSkewed associative, LRU Skewed associative, NRUESkewed associative, NRUE Skewed associative, 5-bit timestampSkewed associative, 5-bit timestamp Elbow cache, 1-step lookahead, 5-bit Elbow cache, 1-step lookahead, 5-bit
timestamptimestamp Elbow cache, 7-step feedback, 5-bit Elbow cache, 7-step feedback, 5-bit
timestamptimestamp
Uppsala Architecture Research Team
Test Configurations (2)Test Configurations (2)
General configuration:General configuration: 8 KB, 16 KB, 32 KB cache size8 KB, 16 KB, 32 KB cache size L1 data cache with 32 byte block sizeL1 data cache with 32 byte block size Write Back – No Allocate on Write &Write Back – No Allocate on Write &
infinite write buffer (all writes infinite write buffer (all writes ignored)ignored)
Miss Rate Reduction (MRR):Miss Rate Reduction (MRR):
MRR = (MRMRR = (MRrefref – MR)/MR – MR)/MRrefref
Uppsala Architecture Research Team
Average miss rate reduction
0,00%
5,00%
10,00%
15,00%
20,00%
25,00%
8KB 16KB 32KB
Cache size
Mis
s r
ate
re
du
cti
on
2-w4-w8-w16-wFully Assoc.Skewed LRUSkewed NRUESkewed TS 5-bitElbow LA 5-bitElbow FB 5-bit-7-step
Uppsala Architecture Research Team
16 KB Cache size
-5,0
0%0,
00%
5,00
%10
,00%
15,0
0%20
,00%
25,0
0%30
,00%
AMMP EQUAKE MCF PARSER VCF_PLACE VCF_ROUTE
Benchmark (Red. SPEC 2000)
Mis
s R
ate
Red
ucti
on
2-w
4-w
8-w
16-w
Fully Assoc.
Skewed LRU
Skewed NRUE
Skewed TS 5-bit
Elbow LA 5-bit
Elbow FB 5-bit-7-step
Uppsala Architecture Research Team
ConclusionsConclusions
I.I. For a 2-way skewed cache, For a 2-way skewed cache, timestamp replacement gives timestamp replacement gives almost the same performance as almost the same performance as LRU.LRU.
II.II. Timestamps are useful.Timestamps are useful.
III.III. A 2-way elbow cache has A 2-way elbow cache has roughly the same performance roughly the same performance as an 8-way set associative as an 8-way set associative cache of the same size.cache of the same size.
Uppsala Architecture Research Team
Conclusions (2)Conclusions (2)
IV.IV. The lookahead design is slightly The lookahead design is slightly better than the feedback.better than the feedback.
V.V. There are drawbacks with all There are drawbacks with all skewed caches (skewing delays, skewed caches (skewing delays, VI-PT). VI-PT).
VI.VI. If the problems can be solved, If the problems can be solved, the elbow cache is a good the elbow cache is a good alternative to set associative alternative to set associative caches.caches.
Uppsala Architecture Research Team
Future WorkFuture Work
Power awareness:Power awareness:
How does an elbow cache How does an elbow cache stand up against stand up against traditional set associative traditional set associative caches when power caches when power consumptions is is considered?considered?
Uppsala Architecture Research Team
LinksLinks
UART web:UART web:
www.it.uu.se/research/group/uart/www.it.uu.se/research/group/uart/
Uppsala Architecture Research Team
??