universidade tecnica de lisboa · universidade tecnica de lisboa instituto superior tecnico...

37
Departamento de Engenharia Inform´ atica UNIVERSIDADE T ´ ECNICA DE LISBOA INSTITUTO SUPERIOR T ´ ECNICO Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 13 Title: Memory System - Memory Hierarchy and Cache Memories Summary: Memory systems; Program access patterns; Cache memories (operation principles, internal organization and cache management policies). 2010/2011 [email protected]

Upload: vandan

Post on 08-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Departamento

de Engenharia

Informatica

UNIVERSIDADE TECNICA DE LISBOA

INSTITUTO SUPERIOR TECNICO

Architectures for Embedded Computing

MEIC-A, MEIC-T, MERC

Lecture Slides

Version 3.0 - English

Lecture 13

Title: Memory System - Memory Hierarchy and Cache Memories

Summary: Memory systems; Program access patterns; Cache memories (operationprinciples, internal organization and cache management policies).

2010/2011

[email protected]

Memory System: Memory Hierarchyand Cache Memories

Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 44

Architectures for EmbeddedComputing

Previous Class

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 44

In the previous class...

� Syncronization and Multi-Processor Systems;

� SIMD Architectures (examples):

◮ Cell (STI - Sony, Toshiba, IBM);

◮ GPUs (NVidia, ATI).

Road Map

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 44

Summary

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 44

Today:

� Memory systems;

� Program access patterns;

� Cache memories:

◮ Operation principles;

◮ Internal organization;

◮ Cache management policies.

Bibliography:

• Computer Architecture: a Quantitative Approach,

Sections 5.1, C.1 and C.2

Memory System

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 44

Connection Between the Processor and Memory

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 44

µP MemoryData Bus

Address Bus

Control Busc

m

n

Data and Program Storage

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 44

� Two distinct approximations:

◮ Harvard architecture: program and data memories arephysically separated and interconnected to the CPU byindependent buses;

◮ Von Neumann architecture: One single memory, whichstores both program and data.

Harvard Architectures

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 44

� Program and data memories are physically separated andinterconnected to the CPU by independent buses;

� Program and data memories may have distinct characteristics:word size, timing, technology, addressing space structure, etc.;

� Program memory is usually larger than data memory (but theopposite may also happen).

Harvard Architectures

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 44

� Since there are independent program and data buses, theprocessor may simultaneously access to the program and thedata memories:

◮ The processing is potentially faster;

� Applications:

◮ Digital Signal Processors (DSPs)◮ Microcontroladores (PIC, AVR, etc.)

Von Neumann Architecture

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 44

� Data and program share the same memory;

� Data and programs are treated in a similar way: inparticular, it is even possible that the program changes itsown program!

Von Neumann Architecture

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 44

� The existence of a single bus frequently raises structuralhazards in memory access.

SOLUTION: Usage of CACHES!!!

� Applications:

◮ Most current General Purpose Processors (GPPs).

Memory Write Cycle

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 44

Memory Read Cycle

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 44

Ideal Memory

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 44

� Desired memory characteristics:

Ideal Memory

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 44

� Desired memory characteristics:

◮ Cheap;

◮ Great capacity;

◮ Fast (reduced access time);

◮ Large bandwidth.

Evolution of CPU Performance vs Memory

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 44

Memory: access time reduces about 7% / year

Processor: performance increases about 35% / year until 1986,55% after 1986.

Memory Hierarchy

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 44

Registers

µPCache Memory Hard Disk

� Each level stores a subset of the data that is also stored inthe next level.

Memory Hierarchy

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 44

Registers

µPCache Memory Hard Disk

Memory System

Memories Adopted in DifferentApplication Domains

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 44

� Servers:

◮ More context changes:

⇒ Greater bandwidth;

◮ Greater importancy devoted to secured access to thestored data.

� Embedded Sistems:

◮ Greater attention to the worst case than to the usualcase;

◮ Caches consume much energy;

◮ Security is not an usual concern.

Characteristics of the Different Memory Levels

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 44

Level 1 2 3 4

Name registers cache memory disk

Capacity < 1kB < 16MB < 16GB > 100GB

Technology CMOS CMOS SRAM CMOS DRAM magnetic disk

Access (ns) 0.25-0.5 0.5-25 80-250 5,000,000

Transf. (MB/s) 20k-100k 5k-10k 1,000-5,000 20-150

Manager compiler hardware operating system OS/manager

Backup cache primary memory disk CD / DVD

Program Access Patterns

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 44

Program Access Patterns

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 44

The characterization of a program access pattern results from ananalysis of its execution traces.

Type of Access Address...

fetch 2 408ed4read 0 10019d94

2 408ed8write 1 10019d88

2 408edc0 100132202 408ee02 408ee4

...

Program Access Patterns

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 44

90/10 Rule: Each program typically uses 90% of its executiontime to execute about 10% of its instructions.

Program 80% 90%

GCC < 5% ≈13%

Spice < 4% < 10%

TeX ≈3% ≈9%

Locality Principle

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 44

90/10 Rule ⇒ Locality Principle

Temporal Locality: if a given address is acceded, it islikely that it will be acceded again in a near future.

Locality Principle

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 44

90/10 Rule ⇒ Locality Principle

Temporal Locality: if a given address is acceded, it islikely that it will be acceded again in a near future.

Spatial Locality: if a given address is acceded, there is ahigh probability that the adjacent addresses will be alsoacceded in a near future.

Locality Principle

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 44

90/10 Rule ⇒ Locality Principle

Temporal Locality: if a given address is acceded, it islikely that it will be acceded again in a near future.

Spatial Locality: if a given address is acceded, there is ahigh probability that the adjacent addresses will be alsoacceded in a near future.

How does the program evolve?

Locality Principle

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 44

90/10 Rule ⇒ Locality Principle

Temporal Locality: if a given address is acceded, it islikely that it will be acceded again in a near future.

Spatial Locality: if a given address is acceded, there is ahigh probability that the adjacent addresses will be alsoacceded in a near future.

How does the program evolve?

Locality Interval: time interval during which the programexhibits some addressing stability.

Cache Memories

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 44

Cache Memories

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 44

� Cache memories intercept the processor accesses to thememory, trying to serve the processor requests faster.

Cache Memories

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 44

� Cache memories intercept the processor accesses to thememory, trying to serve the processor requests faster.

� Access cycle example:

T1

T2

Cache Memories

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 44

� Cache memories intercept the processor accesses to thememory, trying to serve the processor requests faster.

� Access cycle example:

T1

T2

T1 puts the address and read/write command;

Cache Memories

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 44

� Cache memories intercept the processor accesses to thememory, trying to serve the processor requests faster.

� Access cycle example:

T1

T2

T1 puts the address and read/write command;

T2 waits for the answer;

Cache Memories

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 44

� Cache memories intercept the processor accesses to thememory, trying to serve the processor requests faster.

� Access cycle example:

T1

T2

T1 puts the address and read/write command;

T2 waits for the answer;

Tw eventual waiting periods.

Access Diagram

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 44

Cache access: thit=2TMemory access: tmem=6T

1 2 3 4 5 6 7

Without cache 6T

P→M 1 w w w w 2

With cache: hit 2T

P→C 1 2

C→M

With cache: miss 7T

P→C 1 w w w w w 2

C→M 1 w w w w 2

Cache Statistical Indicators

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 44

Hit: the address to be acceded is already in cache, whichimmediately provides the corresponding data.

th: cache access time, upon a hit

ph: fraction of successful cache accesses (hit rate)

Miss: the address to be acceded is not in cache; thecorresponding data must be read from primary memory.

tm: cache access time, upon a miss

pm: fraction of faulty cache accesses (miss rate),pm = 1− ph

tp: miss penalty, tp = tm − th

Computation of Mean Access Times

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 44

Mean Access Time:

taccess = ph × th + pm × tm

= th + pm × tp

Computation of Mean Access Times

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 44

Mean Access Time:

taccess = ph × th + pm × tm

= th + pm × tp

Alternative measure:

Mean Number of Misses per Instruction: miss rate times themean memory accesses per instruction.

Cache Organization

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 44

Cache Organization

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 44

Full Associative Cache: each address can be stored in anycache position.

Tags Data

Decoder

Tag Offset

=Data

Hit

Address

Associative memory: expensive and slow!!! Only used in verysmall caches.

Cache Organization

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 44

Direct Mapped Cache: each address can only be stored in aspecific cache position.

Tags Data

Decoder

Dec

oder

Tag Index Offset

=Data

Hit

Address

Simpler cache, but with a greater number of conflicts.

Cache Organization

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 44

n-way set associative cache: each address can be stored inone of n possible associative sets of the cache.

Tags Data Tags Data

Decoder

Dec

oder

Decoder

Tag Index Offset

==

MultiplexerDecision Logic

Data

Hit

Address

Intermediate solution.

Cache Block

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 44

� Temporal Locality:

Cache Block

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 44

� Temporal Locality:

◮ Keep in cache the latest acceded addreses;

Cache Block

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 44

� Temporal Locality:

◮ Keep in cache the latest acceded addreses;

� But... how can we take profit of spatial locality?

Cache Block

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 44

� Temporal Locality:

◮ Keep in cache the latest acceded addreses;

� But... how can we take profit of spatial locality?

◮ Also load a set of memory positions, contiguous to theacceded address, into the cache;

Cache Block

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 44

� Temporal Locality:

◮ Keep in cache the latest acceded addreses;

� But... how can we take profit of spatial locality?

◮ Also load a set of memory positions, contiguous to theacceded address, into the cache;

◮ Instead of a single memory position, each cache rowcorresponds to a whole block of memory positions.

Cache Organization

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 44

Selection of the desired word, within the cache block.

Tags Data

Decoder

Dec

oder

Tag Index Offset

=Data

Hit

Address

Offset

Cache Management Policies

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 33 / 44

� Cache Management Policies - provide answers to thefollowing questions:

◮ Where should a block be stored in cache?

◮ How to find a given block in cache?

◮ Which block should be removed from cache?

◮ What happens in a write operation?

◮ How should a block be loaded into cache?

Block Position

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 44

Where should a block be stored in cache?

� Fully Associative Cache:

Block Position

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 44

Where should a block be stored in cache?

� Fully Associative Cache:

◮ The block can be stored in any cache position;

Block Position

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 44

Where should a block be stored in cache?

� Fully Associative Cache:

◮ The block can be stored in any cache position;

� Direct Mapping Cache:

Block Position

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 44

Where should a block be stored in cache?

� Fully Associative Cache:

◮ The block can be stored in any cache position;

� Direct Mapping Cache:

◮ Each block can be store in a single and specific cacheposition, defined by the index bits;

Block Position

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 44

Where should a block be stored in cache?

� Fully Associative Cache:

◮ The block can be stored in any cache position;

� Direct Mapping Cache:

◮ Each block can be store in a single and specific cacheposition, defined by the index bits;

AddressTag Index Offset

Block Position

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 44

Where should a block be stored in cache?

� Fully Associative Cache:

◮ The block can be stored in any cache position;

� Direct Mapping Cache:

◮ Each block can be store in a single and specific cacheposition, defined by the index bits;

AddressTag Index Offset

� n-Set Associative:

◮ The block has n possible positions, one within each set,

where its position is defined by the index bits.

Block Identification

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 44

How to find a given block in cache?

Tags Data

DecoderD

ecod

er

Tag Index Offset

=Data

Hit

Address

Inde

x

Substitution Policy

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 36 / 44

Which block should be removed from cache?

Substitution Policy

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 36 / 44

Which block should be removed from cache?

⇒ LRU (Least Recently Used): remove the block that hasbeen unused for the longest time.

⇒ FIFO (First-in First-out): remove the block that wasloaded in cache for the longest time.

⇒ Random

Substitution Policy

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 36 / 44

Which block should be removed from cache?

⇒ LRU (Least Recently Used): remove the block that hasbeen unused for the longest time.

⇒ FIFO (First-in First-out): remove the block that wasloaded in cache for the longest time.

⇒ Random

Example: miss-rate variation:

2 Sets 4 Sets 8 Sets

Capacity LRU RND FIFO LRU RND FIFO LRU RND FIFO

16 kB 11.4 11.7 11.6 11.2 11.5 11.3 10.9 11.2 11.0

64 kB 10.3 10.4 10.4 10.2 10.2 10.3 10.0 10.1 10.0

256 kB 9.2 9.2 9.3 9.2 9.2 9.3 9.2 9.2 9.3

Writing Policies

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 37 / 44

What happens in a write operation?

� Execution statistics: Loads ≈ 37%, Stores ≈ 10%

◮ Writes correspond to about 21% of data accesses

� By considering instruction reads, 7% of memory accessesare writes

⇒ Optimize reads! But do not ignore the writeoperations.

� Contrary to reads, the write operation can only be startedafter we know whether we have a hit or a miss.

Writing Policies

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 38 / 44

Write Through: the write operation is accomplished both incache and in primary memory:

� Easier to implement;

� Cache and memory are always consistent;

� A read miss never causes a memory write;

� Can be optimized using a write buffer.

Write Back: the write operation is only accomplished in cache:

� Writes are completed at cache speed;

� Reduction of memory traffic.

Allocation Policy

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 39 / 44

What should be done after a write miss?

Allocation Policy

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 39 / 44

What should be done after a write miss?

Write Allocate: the block is allocated and copied into cache

No-Write Allocate: the cache is not updated upon a writeoperation (except if it had been previously allocated there)

Allocation Policy

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 39 / 44

What should be done after a write miss?

Write Allocate: the block is allocated and copied into cache

No-Write Allocate: the cache is not updated upon a writeoperation (except if it had been previously allocated there)

Both alternatives may be used together with any of writingpolicies. However, the following combinations are more frequent:

Write Back, Write Allocate: eventual future write operationsto the same address are done in cache.

Write Through, No-Write Allocate: the reasoning is thateven if there are subsequent writes to that block, the writesmust still go to the lower-level memory, so there is little togain in keeping it in cache...

Loading Policies

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 40 / 44

How should a block be loaded into cache?

Blocking: the requested word is only sent to the processor after thewhole block has been loaded into cache:

� Simpler to implement;

� According to spatial locality, the next access will be to thesame block.

Non Blocking:

Early Restart: fetch the words in normal order, but as soon asthe requested word of the block arrives, send it to theprocessor and let the processor continue execution;

Critical Word First: request the missed word first from memoryand send it to the processor as soon as it arrives; let theprocessor continue execution while filling the rest of thewords in the block.

� Greater impact in caches where the block loading implies severalmemory accesses.

Control Bits

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 44

� Valid: indicates if the value that is associated to a giventag is correct or not:

◮ Used, for example, to invalidate all cache positionswhen the system is (re-)initialized, or to invalidatecertain positions that were directly changed in primarymemory by other agents;

� Dirty: indicates if the value that is stored in cache is morerecent than the value that is stored in primary memory:

◮ Only used with write-back caches, to indicate that theprimary memory must be updated before such block issubstituted;

� R/W: indicates if that memory position can be written orwhether it is read-only.

Control Bits

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 44

� Valid: indicates if the value that is associated to a giventag is correct or not:

◮ Used, for example, to invalidate all cache positionswhen the system is (re-)initialized, or to invalidatecertain positions that were directly changed in primarymemory by other agents;

� Dirty: indicates if the value that is stored in cache is morerecent than the value that is stored in primary memory:

◮ Only used with write-back caches, to indicate that theprimary memory must be updated before such block issubstituted;

� R/W: indicates if that memory position can be written orwhether it is read-only.

Control Bits

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 41 / 44

� Valid: indicates if the value that is associated to a giventag is correct or not:

◮ Used, for example, to invalidate all cache positionswhen the system is (re-)initialized, or to invalidatecertain positions that were directly changed in primarymemory by other agents;

� Dirty: indicates if the value that is stored in cache is morerecent than the value that is stored in primary memory:

◮ Only used with write-back caches, to indicate that theprimary memory must be updated before such block issubstituted;

� R/W: indicates if that memory position can be written orwhether it is read-only.

Control Bits

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 42 / 44

� LRU: indicates that such associative set was the last oneto be acceded:

◮ Used in 2-way set associative caches to implement anLRU substitution policy

� Acceded: indicates that this associative set was accededsince the last time that this bit was reset to zero:

◮ Approximation of an LRU policy, by reseting it to zeroin regular time intervals and asserting it to 1 whensuch associative set is acceded.

Control Bits

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 42 / 44

� LRU: indicates that such associative set was the last oneto be acceded:

◮ Used in 2-way set associative caches to implement anLRU substitution policy

� Acceded: indicates that this associative set was accededsince the last time that this bit was reset to zero:

◮ Approximation of an LRU policy, by reseting it to zeroin regular time intervals and asserting it to 1 whensuch associative set is acceded.

Next Class

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 43 / 44

Next Class

Memory System

Program AccessPatterns

Cache Memories

Cache Organization

Next Class

Prof. Nuno Roma ACE 2010/11 - DEI-IST 44 / 44

� Miss Penalty Reduction:

◮ Multi-level caches;

◮ Greater priority to reads than to writes;

◮ Victim caches;

� Miss Rate Reduction:

◮ Analysis of the misses;

◮ Increase the block size;

◮ Increase the cache capacity;

◮ Increase of the associativity level;

◮ Way prediction.