![Page 1: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/1.jpg)
Improving Direct-Mapped Cache Performance by the Addition of a Small
Fully-Associative CacheAnd Pefetch Buffers
Norman P. Jouppi
Presenter:Shrinivas Narayani
![Page 2: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/2.jpg)
Contents Cache Basics Types of Cache misses Cost of Cache misses How to remove the cache misses Larger Block size Adding Associativity (Reducing Conflict Misses)• Miss Cache • Victim Cache .. An Improvement over miss cache Removing Capacity Misses and Compulsory Misses• Prefetch Technique• Stream Buffers Conclusion
![Page 3: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/3.jpg)
• Mapping
(Block Address) modulo (Number of cache blocks in the cache)
• Cache is accessed using lower order bits.
e.g Memory address between (0001) and 11101 map to locations 001 and 101 in cache.
• Data is addressed using tag (higher order bits of address)
![Page 4: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/4.jpg)
Direct Mapped Cache 000 001 010 011 100 101 110 111
00001 00101 01001 01101 10001
![Page 5: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/5.jpg)
Cache Terminology Cache Hit
Cache Miss
Miss Penalty : The miss penalty is the time to replace the block in the upper level with corresponding block from the lower level.
![Page 6: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/6.jpg)
In a direct-Mapped cache, there is only one place the newly requested item and hence only one choice of what to replace.
![Page 7: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/7.jpg)
Types of MissesCompulsory—The first access to a block is not in the cache, so
the block must be brought into the cache. These are also called cold start misses or first reference misses.(Misses in Infinite Cache)
– Capacity—If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved.(Misses in Size )
– Conflict—If the block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory and capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. These are also called collision misses or interference misses.(Misses in N-way Associative)
– Coherence Misses: Result of invalidation to preserve multiprocessor cahce consistency.
![Page 8: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/8.jpg)
Conflict Misses account for
Between 20% to 40% of of all direct-mapped cache misses
![Page 9: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/9.jpg)
Cost of Cache Misses
• Cycle time has been decreasing much faster than memory access time.
• Average number of machine cycles per instruction has been decreasing dramatically. This two effects can results in miss cost.
• Eg : Cache miss on VAX11/780 only cost 60% of the average instruction execution. If every instruction had cache miss then machine performance can go down by %60.
![Page 10: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/10.jpg)
How to Reduce the Cache Miss Increase Block Size Increase Associativity Use a Victim Cache Use a Pseudo Associative Cache Hardware Prefetching Compiler-Controlled Prefetching Compiler Optimizations
![Page 11: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/11.jpg)
Increasing Block size
• One way to reduce the miss rate is to increase the block size– Reduce compulsory misses - why?
• Take advantage of spacial locality
• However, larger blocks have disadvantages– May increase the miss penalty (need to get more data)
– May increase hit time (need to read more data from cache and larger mux)
– May increase conflict and capacity misses.
![Page 12: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/12.jpg)
Adding Associativity
tag and comparatortag and comparatortag and comparatortag and comparator
one cache line of dataone cache line of dataone cache line of dataone cache line of data
• when a miss occur,data is returned
to DM and miss cache
• Each time the upper cache and miss cache is probed
From processor To processor
From next lower cache
tag data
MRU entryFully-associative miss cacheLRU entry
![Page 13: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/13.jpg)
Performance of Miss cache
• Replaces a long off-chip miss penalty with a short one-cycle on-chip miss.
• Data conflict misses more removed
![Page 14: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/14.jpg)
Disadvantage of Miss Cache
Waste of storage space in the miss cache due to duplication of data.
![Page 15: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/15.jpg)
Victim Cache• An improvement
over miss cache.• Loads victim line
instead of requested line.
• In case of miss contents of DM cache and victim cache are swapped.
![Page 16: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/16.jpg)
The effect of DM cache size on victim cache performance
•DM size increase, likelyhood of conflict miss removed by victim cache reduces
![Page 17: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/17.jpg)
Reducing Capacity and Compulsory Misses
Use prefetch technique
1.prefetch always
2.prefetch on miss
3.tagged prefetch
![Page 18: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/18.jpg)
• Prefetch always prfetches always after every reference.
• On miss prefetch on miss always fetches the next line.
• In tagged prefetch each block has a tag bit associated with it.
• When a block is fetched its tag bit set is set zero and one when it is used
• While block undergoes this change a new block is fetched.
![Page 19: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/19.jpg)
Stream buffers• Start prefetch before tag transition
![Page 20: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/20.jpg)
• Stream buffer consist of a series of entries, each consisting of a tag, an available bit, and a data line.
• On a miss it fetches successive line at the miss target.
• Lines after the line requested are placed in buffer which avoid populating the cache with the data which is not needed.
![Page 21: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/21.jpg)
Multi-Way Stream Buffers
▪ only remove 25% of data cache miss
interleaved stream of data from different sources
▪ four stream buffer in parallel
▪ instruction stream unchanged
▪ twice the performance of the single
stream buffer
![Page 22: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/22.jpg)
Stream buffer Vs Prefetch
• Feasible to Implement
• Lower latency
• Extra hardware required by stream buffers is comparable with additional tag required by tagged prefetch.
![Page 23: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/23.jpg)
Stream buffer performance vs.cache size• Only data stream buffer performance
improve as cache size increase
• It can contain data for reference pattern
that access several sets of data.
![Page 24: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/24.jpg)
![Page 25: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/25.jpg)
Conclusion
• Miss cache beneficial in removing data cache miss and conflict misses.
• Victim cache is an improvement over Miss cache that saves the victim of the cache miss instead of target.
• stream buffer reduces capacity,compulsory miss
• Multiway stream buffers are set of stream buffers that can prefetch down several stream concurrently.
![Page 26: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani](https://reader031.vdocument.in/reader031/viewer/2022020417/5697c01e1a28abf838cd1048/html5/thumbnails/26.jpg)
References• Improving Direct-Mapped Cache Performance by the
Addition of a small Fully-Associative Cache and Prefetch Buffers Norman P. Jouppi Computer Organization and design Patterson D. and Hennesy J.