Download - Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar
Cache Coherence Protocols 1
Cache Coherence Cache Coherence ProtocolsProtocols
in in Shared Memory MultiprocessorsShared Memory Multiprocessors
Mehmet ŞenvarMehmet Şenvar
2Cache Coherence Protocols
OutlineOutline IntroductionIntroduction Background InformationBackground Information
The cache coherence problemThe cache coherence problem Cahce Enforcement StrategiesCahce Enforcement Strategies Consistency modelsConsistency models
Simple SolutionsSimple Solutions Hardware ProtocolsHardware Protocols
Snooping protocolsSnooping protocols Directory-based protocolsDirectory-based protocols
Compiler and Software protocolsCompiler and Software protocols Future work and conclusionsFuture work and conclusions
3Cache Coherence Protocols
The Cache Coherence The Cache Coherence ProblemProblem
Caches allow greater performance by storing Caches allow greater performance by storing frequently used data in faster memoryfrequently used data in faster memory
Since all processors share the same address Since all processors share the same address space, it is possible for more than one space, it is possible for more than one processor to cache an address (or data item) processor to cache an address (or data item) at a timeat a time
If one processor updates the data item If one processor updates the data item without informing the other processor, without informing the other processor, inconsistencies may result and cause inconsistencies may result and cause incorrect executionsincorrect executions
4Cache Coherence Protocols
Cache Coherence Cache Coherence ProblemProblem
5Cache Coherence Protocols
Cache Coherence (cont.)Cache Coherence (cont.) For correct execution, coherence must be For correct execution, coherence must be
enforced between the cachesenforced between the caches Two major factors are:Two major factors are:
performanceperformance implementation costimplementation cost
Four primary design issues are:Four primary design issues are: coherence detection strategycoherence detection strategy coherence enforcement strategycoherence enforcement strategy precision of block-sharing informationprecision of block-sharing information cache block sizecache block size
6Cache Coherence Protocols
Cache Enforcement Cache Enforcement StrategiesStrategies
A cache enforcement strategy is the A cache enforcement strategy is the mechanism which makes caches consistentmechanism which makes caches consistent write-update (WU)write-update (WU) write-invalidate (WI)write-invalidate (WI) hybrid protocols, competitive-update (CU)hybrid protocols, competitive-update (CU)
Performance of WU and WI vary Performance of WU and WI vary depending on the application and the depending on the application and the number of writesnumber of writes
Hybrid protocols switch between WU and Hybrid protocols switch between WU and WI based on the # of writes to a blockWI based on the # of writes to a block
7Cache Coherence Protocols
Consistency ModelsConsistency Models A consistency model defines how the A consistency model defines how the
consistency of data values is consistency of data values is maintainedmaintained
Some consistency models are:Some consistency models are: sequential consistencysequential consistency weak consistencyweak consistency release consistencyrelease consistency
Weak consistency models are more Weak consistency models are more efficient to implement and require efficient to implement and require fewer coherence messagesfewer coherence messages
8Cache Coherence Protocols
Shared Caches (1)Shared Caches (1)Processors share a single cache, essentially puntingthe problem.• Useful for very small machines.• E.g., DPC in the Encore, Alliant FX/8.• Problems are limited cache bandwidth and cache interference• Benefits are fine-grain sharing and prefetch effects
9Cache Coherence Protocols
Non-cacheable Items (2)Non-cacheable Items (2)
Make shared data Make shared data nonnon--cacheablecacheable One of the simplest software One of the simplest software
solutionsolution Also at hardware, make cache Also at hardware, make cache
locations unreachablelocations unreachable
10Cache Coherence Protocols
Broadcast Writes (3)Broadcast Writes (3)
Every cache write request is sent to Every cache write request is sent to all other cachesall other caches
Firstly need to discover whether Firstly need to discover whether each cache hold this dataeach cache hold this data
Other copies are either updated or Other copies are either updated or invalidatedinvalidated
Significant additional memory Significant additional memory transactions occurtransactions occur
11Cache Coherence Protocols
Hardware ProtocolsHardware Protocols
Snoop Bus MechanismSnoop Bus Mechanism Directory Based MethodsDirectory Based Methods
Full DirectoryFull Directory Limited DirectoryLimited Directory Chained DirectoryChained Directory
12Cache Coherence Protocols
Snoop Bus ProtocolSnoop Bus Protocol Snooping protocols rely on a shared bus Snooping protocols rely on a shared bus
between the processors for coherencebetween the processors for coherence On a processor write, the write is passed On a processor write, the write is passed
through the cache to main memory on the busthrough the cache to main memory on the bus Any processor caching the address may update Any processor caching the address may update
or invalidate its cache entry as appropriateor invalidate its cache entry as appropriate Snooping protocols do not scale well beyond Snooping protocols do not scale well beyond
32 processors because of the shared bus32 processors because of the shared bus The choice between WU, WI, and CU is The choice between WU, WI, and CU is
especially important to reduce especially important to reduce communicationcommunication
13Cache Coherence Protocols
MESI (4-state) Invalidation Protocol
Each line in the cache can be in one Each line in the cache can be in one of 4 statesof 4 states Modifed (exclusive) : only in 1 cache, Modifed (exclusive) : only in 1 cache,
modifiedmodified Exclusive (unmodified) : only in 1 cache, Exclusive (unmodified) : only in 1 cache,
unmodifiedunmodified Shared (unmodified) Shared (unmodified) InvalidInvalid
14Cache Coherence Protocols
MESI State Transition MESI State Transition DiagramDiagram
15Cache Coherence Protocols
MESI ExampleMESI Example
16Cache Coherence Protocols
Directory-Based Directory-Based ProtocolsProtocols
Directory-based protocols do not rely on a Directory-based protocols do not rely on a shared bus to exchange coherence shared bus to exchange coherence information (use point-to-point information (use point-to-point connections)connections) more scaleable (can have hundreds of more scaleable (can have hundreds of
processors)processors) each processor can have its own memoryeach processor can have its own memory implement weak consistency for efficiencyimplement weak consistency for efficiency
17Cache Coherence Protocols
Directory-Based Directory-Based Protocols (cont.)Protocols (cont.)
Each node maintains a directory storing Each node maintains a directory storing cache information and memory informationcache information and memory information
A processor communicates with the A processor communicates with the directory to access memorydirectory to access memory if a processor requests a non-local memory page, if a processor requests a non-local memory page,
the directory uses its information to find the pagethe directory uses its information to find the page Then, it uses messages to retrieve the page and Then, it uses messages to retrieve the page and
insure all other processors have consistent info.insure all other processors have consistent info. Since the directory maintains which processors Since the directory maintains which processors
are caching the page, it only needs to send are caching the page, it only needs to send messages to those processorsmessages to those processors
18Cache Coherence Protocols
Directory-Based Directory-Based Protocols (cont.)Protocols (cont.)
Designing a directory requires defining:Designing a directory requires defining: cache block granularitycache block granularity cache controller designcache controller design directory structuredirectory structure
Cache block granularity is the size of the Cache block granularity is the size of the cache and the size of a cache linecache and the size of a cache line CC-NUMA machines have a separate, smaller CC-NUMA machines have a separate, smaller
cache from main memorycache from main memory COMA machines use node’s entire memory as COMA machines use node’s entire memory as
cache for remote pagescache for remote pages Block size affects performance (false sharing)Block size affects performance (false sharing)
19Cache Coherence Protocols
Directory-Based Directory-Based Protocols (cont.)Protocols (cont.)
Cache controller is hardware that Cache controller is hardware that maintains the directory and processes maintains the directory and processes memory requestsmemory requests custom hardwarecustom hardware programmable protocol processorprogrammable protocol processor
The directory structure is how the cache The directory structure is how the cache and memory information is organizedand memory information is organized p+1-bit full directoryp+1-bit full directory linked-list directorieslinked-list directories tagged directoriestagged directories
20Cache Coherence Protocols
Directory ModelsDirectory Models
Full DirectoryFull Directory Link to all caches for all shared Link to all caches for all shared
locationslocations Limited DirectoryLimited Directory
To some caches having shared data, n < To some caches having shared data, n < NN
Chained (linked)DirectoryChained (linked)Directory To one chache, form ths cache to To one chache, form ths cache to
others, single/double linkothers, single/double link
21Cache Coherence Protocols
Directory Sample (full)Directory Sample (full)
22Cache Coherence Protocols
Lock-Based ProtocolsLock-Based Protocols New work that promises to be more New work that promises to be more
scaleable than directory protocolsscaleable than directory protocols Implements scope consistency which is Implements scope consistency which is
similar to lazy release consistencysimilar to lazy release consistency Coherence information exchanged by Coherence information exchanged by
reading and writing notices from the lock reading and writing notices from the lock which protects the shared memorywhich protects the shared memory
Currently, implemented in software similar Currently, implemented in software similar to DSM, but may move to hardware if to DSM, but may move to hardware if performance gains can be realizedperformance gains can be realized
23Cache Coherence Protocols
Software ProtocolsSoftware Protocols Software protocols enforce consistency Software protocols enforce consistency
with limited hardware support by relying with limited hardware support by relying either on the compiler or specialized either on the compiler or specialized software handlerssoftware handlers
Similar to distributed shared memory Similar to distributed shared memory (DSM) systems but at a lower level(DSM) systems but at a lower level sharing usually in blocks not pagessharing usually in blocks not pages needs to be more efficient for better needs to be more efficient for better
performanceperformance architecture support for sharingarchitecture support for sharing
24Cache Coherence Protocols
Classification of Software Classification of Software ProtocolsProtocols
Several criteria distinguish software protocols:Several criteria distinguish software protocols: dynamismdynamism - compile-time or run-time analysis - compile-time or run-time analysis selectivityselectivity - level of coherence actions - level of coherence actions restrictivenessrestrictiveness - conservative or as-needed consistency - conservative or as-needed consistency
enforcementenforcement adaptivityadaptivity - can protocol adapt to access patterns - can protocol adapt to access patterns granularitygranularity - size and structure of coherence data - size and structure of coherence data blockingblocking - program block on which coherence is - program block on which coherence is
enforcedenforced positioningpositioning - position of coherence instructions - position of coherence instructions updatingupdating - how memory is updated after a write - how memory is updated after a write checking checking - how incoherence is detected- how incoherence is detected
25Cache Coherence Protocols
Software Coherence with Software Coherence with Limited Hardware Limited Hardware
SupportSupport Compiler must generate consistent code as no Compiler must generate consistent code as no
hardware coherence providedhardware coherence provided Hardware maintains time tags which are updated Hardware maintains time tags which are updated
on every writeon every write On a read, compiler generates coherence reads On a read, compiler generates coherence reads
which check time tags to insure data is consistentwhich check time tags to insure data is consistent Relies on the compiler to detect read which may Relies on the compiler to detect read which may
be inconsistent, and the hardware must maintain be inconsistent, and the hardware must maintain these time tagsthese time tags
Using tags, it is also possible to perform dynamic Using tags, it is also possible to perform dynamic self-invalidation of blocksself-invalidation of blocks
Many techniques based on using these time tagsMany techniques based on using these time tags
26Cache Coherence Protocols
Software Coherence with Software Coherence with Limited Hardware Limited Hardware
Support (cont.)Support (cont.) If hardware has no time tags, Petersen and Li If hardware has no time tags, Petersen and Li
developed an algorithm which uses only page developed an algorithm which uses only page translation hardware and page status tablestranslation hardware and page status tables
Sharing information is maintained by a Sharing information is maintained by a software handler at the page-levelsoftware handler at the page-level
On a page access or fault, the software On a page access or fault, the software handler checks the sharing information, handler checks the sharing information, updates page tables, and performs coherence updates page tables, and performs coherence actionsactions
Slower than hardware as software handlers Slower than hardware as software handlers involve the OS and are on the critical memory involve the OS and are on the critical memory access pathaccess path
27Cache Coherence Protocols
Enforcing Coherence by Enforcing Coherence by Restricting ParallelismRestricting Parallelism
Compilers can also guarantee coherence by Compilers can also guarantee coherence by structuring the language to limit parallelismstructuring the language to limit parallelism easier to enforce coherenceeasier to enforce coherence limits the programmer and potential parallelismlimits the programmer and potential parallelism simplifies compiler designsimplifies compiler design good performance can be achieved with no good performance can be achieved with no
hardware supporthardware support Parallel language restrictions include:Parallel language restrictions include:
doall parallel loopsdoall parallel loops master/slave processesmaster/slave processes
28Cache Coherence Protocols
Optimizing CompilersOptimizing Compilers Optimizing compilers are designed to Optimizing compilers are designed to
maintain coherence with limited hardware maintain coherence with limited hardware support without overly restricting the support without overly restricting the programmerprogrammer rely on detecting data dependenciesrely on detecting data dependencies may use synchronization variables (locks, may use synchronization variables (locks,
barriers)barriers) can provide the hardware with hintscan provide the hardware with hints can detect when coherence is not neededcan detect when coherence is not needed may have problems with dynamic sharingmay have problems with dynamic sharing offer good performance, but are hard to designoffer good performance, but are hard to design
29Cache Coherence Protocols
Future WorkFuture Work Hardware protocols are well defined, and the Hardware protocols are well defined, and the
directory structure is near optimaldirectory structure is near optimal Cost improvements can be obtained by mass Cost improvements can be obtained by mass
producing cache controller chipsproducing cache controller chips Software protocols are a good area for future Software protocols are a good area for future
research because they are also applicable at research because they are also applicable at higher-levels of sharing (DSM, databases, ...)higher-levels of sharing (DSM, databases, ...)
Optimizing compilers need to be improved to Optimizing compilers need to be improved to detect data dependencies and optimize code detect data dependencies and optimize code for the parallel environmentfor the parallel environment
30Cache Coherence Protocols
ConclusionsConclusions Hardware protocols offer the best Hardware protocols offer the best
performance but require high hardware performance but require high hardware costscosts
Software protocols can be used when there Software protocols can be used when there is no hardware support with a slight is no hardware support with a slight performance penaltyperformance penalty
Optimizing compilers can enforce Optimizing compilers can enforce coherence or provide hints to the hardwarecoherence or provide hints to the hardware
A combination of hardware and compiler A combination of hardware and compiler optimizations is the bestoptimizations is the best