1 improving direct-mapped cache performance by the addition of a small fully-associative cache and...
TRANSCRIPT
![Page 1: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/1.jpg)
1
Improving Direct-Mapped Cache Performance by the Addition
of a Small Fully-Associative Cache and Prefetch Buffers
BySreemukha KandlakuntaPhani Shashank Nagari
![Page 2: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/2.jpg)
2
Outline
Base Line Design
Reducing Conflict Misses
Miss Caching
Victim Caching
Reducing Capacity and Compulsory Misses
Stream Buffers
Multi-way Stream Buffers
![Page 3: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/3.jpg)
3
Base Line Design
![Page 4: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/4.jpg)
4
Base Line Design Contd..
• Size of on-chip caches usually varies
• High speed technologies result in smaller on chip caches
• L1 caches are assumed to be direct mapped
• L1 cache line sizes – 16 - 32 B
• L2 cache line sizes – 128-256B
![Page 5: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/5.jpg)
5
Parameters assumed
• Processor Speed: 1000 MIPS
• L1 Inst and Data Cache
Size : 4Kb
Line Size : 16B
• L2 Inst and Data Cache
Size : 1MB
Line Size : 128B
![Page 6: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/6.jpg)
6
Parameters assumed Contd..
Miss Penalty
L1- 24 Inst times
L2- 320 Inst times
![Page 7: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/7.jpg)
7
Test Program Characteristics
![Page 8: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/8.jpg)
8
Base Line system L1 Cache Miss Rates
![Page 9: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/9.jpg)
9
Base Line DesignPerformance
![Page 10: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/10.jpg)
10
Inferences
• Potential performance loss in memory hierarchy
• Improving performance of memory hierarchy rather than CPU performance
• H/w Techniques are used for improving the performance of the baseline M-H
![Page 11: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/11.jpg)
11
How Direct Mapped Cache works
Main Memory
Tag Data Block Number
01110101100011010
100011010
001
000111
010
000111110101
000111110101100011
000
010011
101100
0000000000000000
0101010101010101
1010101010101010
111111111111
011010001000
111110101100
Direct Mapped Cache with 8 Blocks
How to identify?
•Match the Tag
•Tag 01 in block 001 means address 01001 is there
How to search
•00101, 01101, 10101, 11101 maps to block 101
001
001
001
001001
001
![Page 12: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/12.jpg)
12
How Fully-associative Cache works
Main Memory
Tag Data Block Number
110101100011010001
100011010001000111
010001000111110101
000111110101100011
000
001010011
101100
0000000000000000
0101010101010101
1010101010101010
111111111111
011010001000
111110101100
Fully Associative Cache with 8 Blocks
Where to search?
•Every Block in Cache
•Very Expensive
![Page 13: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/13.jpg)
13
Cache Misses
• Three Kinds
- Instruction read miss: Causes most delay, CPU has to wait until the instruction Is fetched from the DRAM
- Data read miss: Causes less delay, Inst not dependent on cache miss can continue execution until data is returned from DRAM
- Data write miss: causes least delay, write can be queued & CPU can continue until queue is full
![Page 14: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/14.jpg)
14
Types of Misses
• Conflict Misses
Reduced by caching : Miss and Victim
• Compulsory Misses
• Capacity Misses
Both are reduced by prefecthing:
Stream Buffers
Multi-way Buffers
![Page 15: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/15.jpg)
15
Conflict Miss
• Conflict Misses are the misses which would not occur if the cache was Fully associative and had LRU
• If an item has been evicted from the cache and the next miss corresponds to that item then that kind of miss is called the conflict miss
![Page 16: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/16.jpg)
16
Conflict Misses Contd..
• Conflict Misses account to
– 20-40% of overall D-M misses
– 39% of L1-D$ misses
– 29% of L1-I$ misses
![Page 17: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/17.jpg)
17
Conflict Misses,4Kb I&D
![Page 18: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/18.jpg)
18
Outline
Base Line Design
Reducing Conflict Misses
Miss Caching
Victim Caching
Reducing Capacity and Compulsory Misses
Stream Buffers
Multi-way Stream Buffers
![Page 19: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/19.jpg)
19
Miss Caching
• Small, Fully associative on-chip cache
• On Miss
Data is returned to
-Direct mapped cache
-Small Miss cache ( Where it replaces LRU item)
• Processor probes both D-M and Miss cache
![Page 20: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/20.jpg)
20
Miss cache Organization
![Page 21: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/21.jpg)
21
Observations
• Eliminates long off-chip miss penalty • More data conflicts misses are removed than
Instruction conflict misses- Instructions within a procedure do not
conflict as long as the procedure size is < cache size
- If an instruction within the program calls another program which may be mapped else where, a conflict arises- instruction conflict
![Page 22: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/22.jpg)
22
Miss Cache Performance
• For 4 KB D$ size - Miss cache of 2 entries can remove 25% of D$ conflict misses i.e. 13% of overall D$ misses- Miss cache of 4 entries can remove 36% of D$ conflict misses i.e. 18% of overall D$ misses
• After 4 entries the improvement is minor
![Page 23: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/23.jpg)
23
Conflict Misses removed by Miss caching
![Page 24: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/24.jpg)
24
Overall Cache Misses removed by Miss Caching
![Page 25: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/25.jpg)
25
Outline
Base Line Design
Reducing Conflict Misses
Miss Caching
Victim Caching
Reducing Capacity and Compulsory Misses
Stream Buffers
Multi-way Stream Buffers
![Page 26: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/26.jpg)
26
Victim Caching
• Duplication of the data wastes storage space in miss cache
• Loads F-A cache with victim line from the
D-M cache
• When data misses in the D-M cache but hits in the Victim cache, contents are swapped
![Page 27: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/27.jpg)
27
Victim Cache Organization
![Page 28: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/28.jpg)
28
Victim Cache Performance
• Victim cache consisting of just one line is better than miss cache consisting of 2 lines
• Significant improvement in the performance of all the benchmark programs
![Page 29: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/29.jpg)
29
Conflict Misses removed by Victim Caching
![Page 30: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/30.jpg)
30
Overall Cache Misses removed by Victim Caching
![Page 31: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/31.jpg)
31
Comparison of Miss cache and Victim cache performances
![Page 32: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/32.jpg)
32
Effect of D-M cache size on Victim cache performance
• Smaller D-M caches – Most benefited due to addition of victim cache
• As D-M cache size increases, likelihood of conflict misses removed by victim cache decreases
• As the percentage of conflict misses decreases, the percentage of these misses removed by victim cache decreases
![Page 33: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/33.jpg)
33
Victim cache: vary direct-map cache size
![Page 34: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/34.jpg)
34
Effect of Line Size on Victim CachePerformance
• As line size increases the number of conflict misses increases
• As a result percentage of misses removed by victim cache increases
![Page 35: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/35.jpg)
35
Victim cache: vary data cache line size
![Page 36: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/36.jpg)
36
Victim caches and L2 Caches
• Victim caches are also useful for L2 caches due to large line sizes
• Using L1 victim cache can also reduce the number of L2 conflict misses
![Page 37: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/37.jpg)
37
Outline
Base Line Design
Reducing Conflict Misses
Miss Caching
Victim Caching
Reducing Capacity and Compulsory Misses
Stream Buffers
Multi-way Stream Buffers
![Page 38: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/38.jpg)
38
Reducing Capacity and Compulsory Misses
• Compulsory Misses
First reference to a piece of data
• Capacity MissesDue to insufficient cache size
![Page 39: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/39.jpg)
39
Prefetching Algorithms
• Prefetch Always : Access to line “i” implies to prefetch access for “i+1”
• Prefetch on miss : Reference to block “i”
causes prefetch to block “i+1” Iff the block was a miss
• Tagged Prefetch : Tag bit is set to `0` when a block is prefetched and to set 1 when block is used
![Page 40: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/40.jpg)
40
Limited Time For Prefetch
![Page 41: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/41.jpg)
41
Outline
Base Line Design
Reducing Conflict Misses
Miss Caching
Victim Caching
Reducing Capacity and Compulsory Misses
Stream Buffers
Multi-way Stream Buffers
![Page 42: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/42.jpg)
42
Stream Buffers
• Prefetched lines are placed in buffer in order to avoid polluting
• Each entry consists of tag ,an available bit and data line
• If a reference misses in the cache but hits in the buffer , the cache can be reloaded
• When a line is moved from the SB , entries in the SB shift up and new successive data is fetched
![Page 43: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/43.jpg)
43
Stream Buffer Mechanism
![Page 44: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/44.jpg)
44
Stream Buffer Mechanism Contd..
• On Miss Prefetch successive lines Enter tag for address in to the SB Set available bit to false
• On return of the prefetched data Place data in entry with its tag Set available bit to true
![Page 45: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/45.jpg)
45
Stream Buffer Performance
• Most instruction references break the purely sequential access pattern by the time the 6th successive line is fetched
• Data references end even sooner
• As a result , Stream buffers show better performance at removing I$ misses
![Page 46: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/46.jpg)
46
Sequential SB performance
![Page 47: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/47.jpg)
Limitations of Stream Buffers
• Stream buffers considered are FIFO queues
• Head of the queue has tag comparator
• Elements must be removed strictly in sequence
• Works only for sequential line misses
• Fails for a non-sequential line miss
47
![Page 48: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/48.jpg)
48
Outline
Base Line Design
Reducing Conflict Misses
Miss Caching
Victim Caching
Reducing Capacity and Compulsory Misses
Stream Buffers
Multi-way Stream Buffers
![Page 49: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/49.jpg)
49
Multi-way stream buffers
• Single data SB`s could remove 72% of I$ misses and 25% of D$ misses
• Multi-way SB was simulated- to improve performance of SB`s for data references
• Consists of 4 SB in parallel
• On Miss the least recently Hit SB is cleared and data is started fetching from the miss address
![Page 50: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/50.jpg)
50
Multi-way stream Buffer Design
![Page 51: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/51.jpg)
51
Observations
• Performance of the instruction stream remains virtually unchanged
• Significant improvement in the performance of the data stream
• Removes 43% of the misses for the test programs i.e. almost twice the performance of single SB
![Page 52: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/52.jpg)
52
Four-way SB performance
![Page 53: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/53.jpg)
53
SB Performance Vs Cache size
![Page 54: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/54.jpg)
54
SB Performance Vs Line size
![Page 55: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/55.jpg)
55
Performance Evaluation
• Over the set of 6 benchmarks on an average 2.5% of 4KB D-M D$ misses that hit in a 4 entry victim cache also hit in a 4 way SB
• The combination of buffers and victim caches reduces the L1 miss rate to less than half of that of the base line system
• Resulting in an average of 143% improvement in system performance for the 6 benchmarks
![Page 56: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/56.jpg)
56
Improved System Performance
![Page 57: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/57.jpg)
57
Future Enhancements
• The study has concentrated on applying these H/W techniques to L1 caches
• Application of these techniques to L2 caches forms an interesting area of future work
• Performance of victim caching and stream buffers can be investigated for OS design and for multi-programming work loads
![Page 58: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/58.jpg)
58
Conclusions
• Miss caches remove tight conflicts where several addresses map to the same cache line
• Victim caches are an improvement to miss caching that save the victim of the cache miss
• Stream buffers prefetch cache lines after missed cache line
• Multi-way stream buffers are a set of stream buffers that can do concurrent prefetches
![Page 59: 1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank](https://reader035.vdocument.in/reader035/viewer/2022081516/56649c785503460f9492da94/html5/thumbnails/59.jpg)
59