cooling the hot sets: improved space utilization in large caches via dynamic set balancing mainak...

43
Cooling the Hot Sets: Cooling the Hot Sets: Improved Space Utilization Improved Space Utilization in Large Caches via in Large Caches via Dynamic Set Balancing Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur Mainak Chaudhuri, IIT Kanpur [email protected] [email protected]

Upload: eustacia-york

Post on 08-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Balanced $ (IIT, Kanpur)Sketch  Observations Design detail Design detail –Destination of migration –Locating migrated blocks –Hit/Miss critical path –Selective migration –Throttling migration –Retaining migrated blocks Scaling to CMPs Scaling to CMPs Simulation results Simulation results Summary Summary

TRANSCRIPT

Page 1: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Cooling the Hot Sets:Cooling the Hot Sets:Improved Space Utilization Improved Space Utilization

in Large Caches via in Large Caches via Dynamic Set BalancingDynamic Set Balancing

Mainak Chaudhuri, IIT KanpurMainak Chaudhuri, IIT [email protected]@iitk.ac.in

Page 2: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Talk in one slideTalk in one slide Closed-addressed hashing used in Closed-addressed hashing used in

traditional cache designs with a fixed traditional cache designs with a fixed collision chain length (known as collision chain length (known as associativity)associativity)

Clustering of physical addresses to a few Clustering of physical addresses to a few hot sets is a well-known phenomenonhot sets is a well-known phenomenon

Non-uniform set utilization leads to high Non-uniform set utilization leads to high volume of conflict missesvolume of conflict misses

First proposal on a fully dynamic scheme First proposal on a fully dynamic scheme to re-balance sets by migrating blocks to re-balance sets by migrating blocks from “hot regions” to “cooler regions”from “hot regions” to “cooler regions”

Page 3: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 4: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Observation#1Observation#1

Page 5: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Observation#2, 3Observation#2, 3

Page 6: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 7: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Design detailDesign detail OverviewOverview

– The basic idea is to migrate evicted The basic idea is to migrate evicted blocks to sets with smaller fill countblocks to sets with smaller fill count

Involves the following sub-problemsInvolves the following sub-problems– Identify a good receiver set quicklyIdentify a good receiver set quickly– Locate migrated blocks efficientlyLocate migrated blocks efficiently– Offer dynamic control of hit/miss critical Offer dynamic control of hit/miss critical

pathpath Optimizations worth exploringOptimizations worth exploring

– Selective migration (not all blocks are Selective migration (not all blocks are important)important)

– Bound migrations from a particular setBound migrations from a particular set– Retain migrated blocks (the difficult part)Retain migrated blocks (the difficult part)

Page 8: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 9: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Destination of migrationDestination of migration Associate a saturating counter C(s) Associate a saturating counter C(s)

with each set s and a global counter with each set s and a global counter GG– Increment C(s) on a refill into sIncrement C(s) on a refill into s– When C(s) reaches a value equal to the When C(s) reaches a value equal to the

associativity, increment Gassociativity, increment G– When G reaches a value equal to the When G reaches a value equal to the

number of sets, reset G and C(s) for all number of sets, reset G and C(s) for all ss

– Size C(s) so that it can count up to k Size C(s) so that it can count up to k times the associativity (we set k to 4)times the associativity (we set k to 4)

Page 10: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Destination of migrationDestination of migration Divide the sets into clusters of sets Divide the sets into clusters of sets

and associate a saturating counter and associate a saturating counter D(u) with each cluster uD(u) with each cluster u– Increment D(u) whenever C(s) is Increment D(u) whenever C(s) is

incremented for some s in uincremented for some s in u– Reset D(u) when all C(s) are resetReset D(u) when all C(s) are reset– Have a comparator tree to compute the Have a comparator tree to compute the

minimum among all D(u) whenever an minimum among all D(u) whenever an increment takes place (scalable?)increment takes place (scalable?)

– Have a second comparator tree to Have a second comparator tree to compute the minimum among all C(s) compute the minimum among all C(s) within the minimum u found by the first within the minimum u found by the first tree; the set t with this minimum is the tree; the set t with this minimum is the target of migration provided C(s) > C(t) target of migration provided C(s) > C(t) for source set sfor source set s

Page 11: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 12: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Locating migrated blocksLocating migrated blocks The migrated tags are duplicated in The migrated tags are duplicated in

a migration tag cache (MTC)a migration tag cache (MTC)– MTC is organized as a direct-mapped MTC is organized as a direct-mapped

tabletable– Each entry has a tag, a target set index, Each entry has a tag, a target set index,

a forward pointer to an MTC entry, a a forward pointer to an MTC entry, a backward pointer to an MTC entry, a backward pointer to an MTC entry, a head bit, and a tail bithead bit, and a tail bit

– Starting at an index of the MTC, one can Starting at an index of the MTC, one can follow the forward pointers in a linked follow the forward pointers in a linked list until the tail bit is encounteredlist until the tail bit is encountered

– One tag list in the MTC corresponds to One tag list in the MTC corresponds to the migrated tags from a particular the migrated tags from a particular parent set in the main cacheparent set in the main cache

Page 13: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Locating migrated blocksLocating migrated blocks Tag lookup protocolTag lookup protocol

– With each set s in the main cache, a head With each set s in the main cache, a head pointer H(s) to the MTC is maintained; H(s) pointer H(s) to the MTC is maintained; H(s) points to the index of MTC where the list of points to the index of MTC where the list of migrated tags belonging to set s beginsmigrated tags belonging to set s begins

– The main cache is looked up first as usualThe main cache is looked up first as usual– On a miss, H(s) is read out and an MTC On a miss, H(s) is read out and an MTC

walk is initiated at index H(s)walk is initiated at index H(s)– Note that on reset, the MTC is organized as Note that on reset, the MTC is organized as

a free list; a new migration from set s a free list; a new migration from set s allocates an MTC entry, links it at the head allocates an MTC entry, links it at the head of the list starting at H(s), and updates H(s)of the list starting at H(s), and updates H(s)

Page 14: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Locating migrated blocksLocating migrated blocks Tag lookup protocolTag lookup protocol

– On an MTC hit, the block is swapped On an MTC hit, the block is swapped with the LRU block in the parent set to with the LRU block in the parent set to improve future hit latency (behaves like improve future hit latency (behaves like a folded victim cache)a folded victim cache)

– It is necessary to avoid false hitsIt is necessary to avoid false hits– Now the same set may contain the Now the same set may contain the

same tag multiple timessame tag multiple times– Each tag is extended by log(A) bits Each tag is extended by log(A) bits

where A is the associativity; the target where A is the associativity; the target way of a migrated tag is stored along way of a migrated tag is stored along with the tagwith the tag

Page 15: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Locating migrated blocksLocating migrated blocks Replacement of migrated blocksReplacement of migrated blocks

– A migrated block may get replaced due A migrated block may get replaced due to primary or secondary replacementsto primary or secondary replacements

– A primary migrated block replacement A primary migrated block replacement is again migrated to a different target is again migrated to a different target set; this case is easy to handle because set; this case is easy to handle because it requires only MTC entry modificationit requires only MTC entry modification

– But to get to the MTC entry, one needs But to get to the MTC entry, one needs to maintain a direct MTC entry pointer to maintain a direct MTC entry pointer MEP(t) with each migrated tag t in the MEP(t) with each migrated tag t in the main cachemain cache

Page 16: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Locating migrated blocksLocating migrated blocks Replacement of migrated blocksReplacement of migrated blocks

– A secondary migrated block replacement A secondary migrated block replacement evicts the block from the cacheevicts the block from the cache

– This requires delinking the tag from its listThis requires delinking the tag from its list– Efficient delinking is possible only in Efficient delinking is possible only in

doubly-linked lists and this is why we need doubly-linked lists and this is why we need a backward pointer with each MTC entrya backward pointer with each MTC entry

– Also, this may need updating the H(s) field Also, this may need updating the H(s) field in the parent set sin the parent set s

– To be able to get to the parent set, each To be able to get to the parent set, each MTC entry needs to store the parent set MTC entry needs to store the parent set indexindex

Page 17: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Locating migrated blocksLocating migrated blocks Summary of structures added till nowSummary of structures added till now

– Per set s: one saturating counter C(s), Per set s: one saturating counter C(s), one head pointer H(s) and VALID(H(s))one head pointer H(s) and VALID(H(s))

– Per tag t: MTC entry pointer MEP(t) and Per tag t: MTC entry pointer MEP(t) and VALID(MEP(t)), extra way bits W(t)VALID(MEP(t)), extra way bits W(t)

– Per MTC entry m: migrated tag MT(m) Per MTC entry m: migrated tag MT(m) including the extra way bits, target set including the extra way bits, target set index TS(m), parent set index PS(m), index TS(m), parent set index PS(m), forward pointer FPTR(m), backward forward pointer FPTR(m), backward pointer BPTR(m), head/tail bits HT(m)pointer BPTR(m), head/tail bits HT(m)

– Per set cluster u: saturating counter D(u)Per set cluster u: saturating counter D(u)– A global saturating counterA global saturating counter– Two comparator treesTwo comparator trees

Page 18: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 19: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Hit/Miss critical pathHit/Miss critical path Reducing the MTC walk latencyReducing the MTC walk latency

– Proposal#1: Make MTC dual-ported so Proposal#1: Make MTC dual-ported so that a list can be walked from both ends that a list can be walked from both ends (a win-win situation); halves hit as well as (a win-win situation); halves hit as well as miss pathsmiss paths

– Add a tail pointer T(s) to each set (along Add a tail pointer T(s) to each set (along with H(s)) so that the tail of a list can be with H(s)) so that the tail of a list can be accessed directlyaccessed directly

– Proposal#2: Maintain the summary of Proposal#2: Maintain the summary of migrated tags from a set s in a small filter migrated tags from a set s in a small filter F(s) attached to sF(s) attached to s

– Query F(s) first before walking MTC; a Query F(s) first before walking MTC; a negative response from F(s) means the negative response from F(s) means the tag is definitely not there in MTC; tag is definitely not there in MTC; optimizes the miss path onlyoptimizes the miss path only

Page 20: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Hit/Miss critical pathHit/Miss critical path Reducing the MTC walk latencyReducing the MTC walk latency

– We experimented with a simple design We experimented with a simple design of a 60-bit F(s) with great successof a 60-bit F(s) with great success

– Divide the 60 bits into nine segments: Divide the 60 bits into nine segments: each of the lower eight segments is each of the lower eight segments is seven bits wide and the upper segment seven bits wide and the upper segment is four bits wideis four bits wide

– When a tag t is queried, the lower three When a tag t is queried, the lower three bits of t identifies one of the lower eight bits of t identifies one of the lower eight segments of F(s)segments of F(s)

– Let the contents of the identified Let the contents of the identified segment be f[6:0] and the contents of segment be f[6:0] and the contents of the upper segment be g[3:0]the upper segment be g[3:0]

Page 21: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Hit/Miss critical pathHit/Miss critical path Reducing the MTC walk latencyReducing the MTC walk latency

– The filter says “yes” if and only if (f[6:0] The filter says “yes” if and only if (f[6:0] AND t[9:3]) == t[9:3] and (g[3:0] AND AND t[9:3]) == t[9:3] and (g[3:0] AND t[13:10]) == t[13:10]t[13:10]) == t[13:10]

– A newly migrated tag t is hashed into A newly migrated tag t is hashed into F(s) by ORing t[9:3] into the identified F(s) by ORing t[9:3] into the identified segment and ORing t[13:10] with the segment and ORing t[13:10] with the upper segmentupper segment

– F(s) is not updated if a migrated tag is F(s) is not updated if a migrated tag is removed (not possible to update)removed (not possible to update)

– On a false positive from F(s), all the On a false positive from F(s), all the migrated tags for the set s will have to migrated tags for the set s will have to be visited anyway; at this time F(s) is be visited anyway; at this time F(s) is cleared and rebuilt cleared and rebuilt

Page 22: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 23: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Selective migrationSelective migration Not all blocks are importantNot all blocks are important

– Unnecessary migrations waste energy Unnecessary migrations waste energy and may hurt performance by using up and may hurt performance by using up MTC spaceMTC space

– Ideally, we want to migrate the most Ideally, we want to migrate the most frequently missing blocksfrequently missing blocks

– Usually, these blocks are associated Usually, these blocks are associated with the hot setswith the hot sets

– The idea, therefore, should be to The idea, therefore, should be to identify the hot sets and migrate only identify the hot sets and migrate only the blocks evicted from the hot setsthe blocks evicted from the hot sets

Page 24: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Selective migrationSelective migration Identifying hot setsIdentifying hot sets

– Associate a saturating counter R(s) with Associate a saturating counter R(s) with each set s to count the number of each set s to count the number of external refills to the setexternal refills to the set

– Whenever some R(s) reaches its Whenever some R(s) reaches its maximum value, all R(s) are reset maximum value, all R(s) are reset (leader-decides rule)(leader-decides rule)

– Maintain the total refill count across all Maintain the total refill count across all sets in a register TRC and the maximum sets in a register TRC and the maximum refill count across all sets in another refill count across all sets in another register MaxRC; let average refill count register MaxRC; let average refill count be ARC = TRC >> log(|S|)be ARC = TRC >> log(|S|)

– Definition: A set s is hot if and only if R(s) Definition: A set s is hot if and only if R(s) > ARC + (MaxRC – ARC) >> delta> ARC + (MaxRC – ARC) >> delta

– Delta is dynamically incrementedDelta is dynamically incremented

Page 25: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 26: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Throttling migrationThrottling migration If a set becomes very hot, it may start If a set becomes very hot, it may start

migrating a large number of blocksmigrating a large number of blocks– While this may appear desirable, While this may appear desirable,

monotonically increasing expected MTC monotonically increasing expected MTC walk cost outweighs the benefits soonwalk cost outweighs the benefits soon

– We impose a limit on the length of the We impose a limit on the length of the migrated tag list belonging to a particular migrated tag list belonging to a particular setset

– However, a static limit may not work; so However, a static limit may not work; so the limit is dynamically increased by the limit is dynamically increased by monitoring the volume of rejected monitoring the volume of rejected migrations due to too short a length limitmigrations due to too short a length limit

– Each set s now maintains a list length Each set s now maintains a list length register LLR(s)register LLR(s)

Page 27: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 28: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Retaining migrated blocksRetaining migrated blocks Number of misses between two misses Number of misses between two misses

to the same block is often very highto the same block is often very high– Points to the danger of losing the Points to the danger of losing the

migrated blocks before they get reusedmigrated blocks before they get reused– We need to design a replacement policy We need to design a replacement policy

that gives lower replacement priority to that gives lower replacement priority to the migrated blocks because these are the migrated blocks because these are the blocks we really want to retainthe blocks we really want to retain

– Classify the sets into high-hit and low-hit Classify the sets into high-hit and low-hit setssets

– For high-hit sets continue with baseline For high-hit sets continue with baseline policy (LRU in our case)policy (LRU in our case)

– For low-hit sets, consider the non-For low-hit sets, consider the non-migrated blocks before the migrated onesmigrated blocks before the migrated ones

Page 29: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Retaining migrated blocksRetaining migrated blocks Associate a hit counter HC(s) with each Associate a hit counter HC(s) with each

set sset s– Reset HC(s) when the refill counter is Reset HC(s) when the refill counter is

resetreset– Count a hit on a migrated block as a hit in Count a hit on a migrated block as a hit in

the parent setthe parent set– Classify a set as low-hit if and only if HC(s) Classify a set as low-hit if and only if HC(s)

≤ hR(s) and R(s) > r for some constant h ≤ hR(s) and R(s) > r for some constant h > 1 and r < associativity> 1 and r < associativity

– We fix h to 4 and r to 1/8We fix h to 4 and r to 1/8thth of associativity of associativity More research is needed on better More research is needed on better

retention schemesretention schemes– This is going to play a big roleThis is going to play a big role

Page 30: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 31: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Scaling to CMPsScaling to CMPs Assume that the CMP caches will be Assume that the CMP caches will be

bankedbanked– All the policies can be applied to each bank All the policies can be applied to each bank

or a subset of close-by banks or a subset of close-by banks independentlyindependently

– No cross-bank (or cross-switch) migrationNo cross-bank (or cross-switch) migration– Use cross-bank migration only for proximity Use cross-bank migration only for proximity

enhancement (more detail in second talk)enhancement (more detail in second talk)– The entire design scales seamlessly to The entire design scales seamlessly to

larger cacheslarger caches In our simulations, we assume that a In our simulations, we assume that a

pair of banks share a switch on a ring pair of banks share a switch on a ring and cross-bank migration is allowed and cross-bank migration is allowed only within a paironly within a pair

Page 32: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 33: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Simulation resultsSimulation results Single-threaded and multi-threaded Single-threaded and multi-threaded

applicationsapplications Single-threaded runs are done on 2 Single-threaded runs are done on 2

MB 16-way L2 cachesMB 16-way L2 caches Multi-threaded runs are done on 8 Multi-threaded runs are done on 8

cores sharing a 4 MB 16-way L2 cachecores sharing a 4 MB 16-way L2 cache– Each core has private L1 cachesEach core has private L1 caches

The MTC is sized to hold half the tags The MTC is sized to hold half the tags compared to the main cachecompared to the main cache

Space overhead of about 56 KB per 1 Space overhead of about 56 KB per 1 MB bankMB bank

Page 34: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Simulation resultsSimulation results

Page 35: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Simulation resultsSimulation results

Page 36: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Simulation resultsSimulation results

Page 37: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Simulation resultsSimulation results

Page 38: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Simulation resultsSimulation results

Page 39: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Simulation resultsSimulation results

Page 40: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

Simulation resultsSimulation results

Page 41: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SketchSketch ObservationsObservations Design detailDesign detail

– Destination of migrationDestination of migration– Locating migrated blocksLocating migrated blocks– Hit/Miss critical pathHit/Miss critical path– Selective migrationSelective migration– Throttling migrationThrottling migration– Retaining migrated blocksRetaining migrated blocks

Scaling to CMPsScaling to CMPs Simulation resultsSimulation results SummarySummary

Page 42: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Balanced $ (IIT, Kanpur)Balanced $ (IIT, Kanpur)

SummarySummary Huge potential for improving Huge potential for improving

performance and saving energy with performance and saving energy with slightly over 5% extra storageslightly over 5% extra storage

Logic simplifications need to be Logic simplifications need to be explored furtherexplored further

Page 43: Cooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing Mainak Chaudhuri, IIT Kanpur

Cooling the Hot Sets:Cooling the Hot Sets:Improving Space Utilization Improving Space Utilization

in Large Caches viain Large Caches viaDynamic Set BalancingDynamic Set Balancing

Mainak Chaudhuri, IIT KanpurMainak Chaudhuri, IIT [email protected]@iitk.ac.in

THANK YOU!