universidade tecnica de lisboa · universidade tecnica de lisboa instituto superior tecnico...

Departamento

de Engenharia

Informatica

UNIVERSIDADE TECNICA DE LISBOA

INSTITUTO SUPERIOR TECNICO

Architectures for Embedded Computing

MEIC-A, MEIC-T, MERC

Lecture Slides

Version 3.0 - English

Lecture 13

Title: Memory System - Memory Hierarchy and Cache Memories

Summary: Memory systems; Program access patterns; Cache memories (operationprinciples, internal organization and cache management policies).

2010/2011

[email protected]

Memory System: Memory Hierarchyand Cache Memories

Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 44

Architectures for EmbeddedComputing

Previous Class

Memory System

Program AccessPatterns

Cache Memories

Cache Organization


In the previous class...

� Syncronization and Multi-Processor Systems;

� SIMD Architectures (examples):

◮ Cell (STI - Sony, Toshiba, IBM);

◮ GPUs (NVidia, ATI).

Road Map

Memory System


Cache Memories

Cache Organization


Summary

Memory System


Cache Memories

Cache Organization


Today:

� Memory systems;

� Program access patterns;

� Cache memories:

◮ Operation principles;

◮ Internal organization;

◮ Cache management policies.

Bibliography:

• Computer Architecture: a Quantitative Approach,

Sections 5.1, C.1 and C.2

Memory System

Memory System


Cache Memories

Cache Organization


Connection Between the Processor and Memory

Memory System


Cache Memories

Cache Organization


µP MemoryData Bus

Address Bus

Control Busc

m

n

Data and Program Storage

Memory System


Cache Memories

Cache Organization


� Two distinct approximations:

◮ Harvard architecture: program and data memories arephysically separated and interconnected to the CPU byindependent buses;

◮ Von Neumann architecture: One single memory, whichstores both program and data.

Harvard Architectures

Memory System


Cache Memories

Cache Organization


� Program and data memories are physically separated andinterconnected to the CPU by independent buses;

� Program and data memories may have distinct characteristics:word size, timing, technology, addressing space structure, etc.;

� Program memory is usually larger than data memory (but theopposite may also happen).

Harvard Architectures

Memory System


Cache Memories

Cache Organization


� Since there are independent program and data buses, theprocessor may simultaneously access to the program and thedata memories:

◮ The processing is potentially faster;

� Applications:

◮ Digital Signal Processors (DSPs)◮ Microcontroladores (PIC, AVR, etc.)

Von Neumann Architecture

Memory System


Cache Memories

Cache Organization


� Data and program share the same memory;

� Data and programs are treated in a similar way: inparticular, it is even possible that the program changes itsown program!

Von Neumann Architecture

Memory System


Cache Memories

Cache Organization


� The existence of a single bus frequently raises structuralhazards in memory access.

SOLUTION: Usage of CACHES!!!

� Applications:

◮ Most current General Purpose Processors (GPPs).

Memory Write Cycle

Memory System


Cache Memories

Cache Organization


Memory Read Cycle

Memory System


Cache Memories

Cache Organization


Ideal Memory

Memory System


Cache Memories

Cache Organization


� Desired memory characteristics:

Ideal Memory

Memory System


Cache Memories

Cache Organization


� Desired memory characteristics:

◮ Cheap;

◮ Great capacity;

◮ Fast (reduced access time);

◮ Large bandwidth.

Evolution of CPU Performance vs Memory

Memory System


Cache Memories

Cache Organization


Memory: access time reduces about 7% / year

Processor: performance increases about 35% / year until 1986,55% after 1986.

Memory Hierarchy

Memory System


Cache Memories

Cache Organization


Registers

µPCache Memory Hard Disk

� Each level stores a subset of the data that is also stored inthe next level.

Memory Hierarchy

Memory System


Cache Memories

Cache Organization


Registers

µPCache Memory Hard Disk

Memory System

Memories Adopted in DifferentApplication Domains

Memory System


Cache Memories

Cache Organization


� Servers:

◮ More context changes:

⇒ Greater bandwidth;

◮ Greater importancy devoted to secured access to thestored data.

� Embedded Sistems:

◮ Greater attention to the worst case than to the usualcase;

◮ Caches consume much energy;

◮ Security is not an usual concern.

Characteristics of the Different Memory Levels

Memory System


Cache Memories

Cache Organization


Level 1 2 3 4

Name registers cache memory disk

Capacity < 1kB < 16MB < 16GB > 100GB

Technology CMOS CMOS SRAM CMOS DRAM magnetic disk

Access (ns) 0.25-0.5 0.5-25 80-250 5,000,000

Transf. (MB/s) 20k-100k 5k-10k 1,000-5,000 20-150

Manager compiler hardware operating system OS/manager

Backup cache primary memory disk CD / DVD

Program Access Patterns

Memory System


Cache Memories

Cache Organization



Memory System


Cache Memories

Cache Organization


The characterization of a program access pattern results from ananalysis of its execution traces.

Type of Access Address...

fetch 2 408ed4read 0 10019d94

2 408ed8write 1 10019d88

2 408edc0 100132202 408ee02 408ee4

...


Memory System


Cache Memories

Cache Organization


90/10 Rule: Each program typically uses 90% of its executiontime to execute about 10% of its instructions.

Program 80% 90%

GCC < 5% ≈13%

Spice < 4% < 10%

TeX ≈3% ≈9%

Locality Principle

Memory System


Cache Memories

Cache Organization


90/10 Rule ⇒ Locality Principle

Temporal Locality: if a given address is acceded, it islikely that it will be acceded again in a near future.

Locality Principle

Memory System


Cache Memories

Cache Organization




Spatial Locality: if a given address is acceded, there is ahigh probability that the adjacent addresses will be alsoacceded in a near future.

Locality Principle

Memory System


Cache Memories

Cache Organization





How does the program evolve?

Locality Principle

Memory System


Cache Memories

Cache Organization





How does the program evolve?

Locality Interval: time interval during which the programexhibits some addressing stability.

Cache Memories

Memory System


Cache Memories

Cache Organization


Cache Memories

Memory System


Cache Memories

Cache Organization


� Cache memories intercept the processor accesses to thememory, trying to serve the processor requests faster.

Cache Memories

Memory System


Cache Memories

Cache Organization



� Access cycle example:

T1

T2

Cache Memories

Memory System


Cache Memories

Cache Organization




T1

T2

T1 puts the address and read/write command;

Cache Memories

Memory System


Cache Memories

Cache Organization




T1

T2


T2 waits for the answer;

Cache Memories

Memory System


Cache Memories

Cache Organization




T1

T2


T2 waits for the answer;

Tw eventual waiting periods.

Access Diagram

Memory System


Cache Memories

Cache Organization


Cache access: thit=2TMemory access: tmem=6T

1 2 3 4 5 6 7

Without cache 6T

P→M 1 w w w w 2

With cache: hit 2T

P→C 1 2

C→M

With cache: miss 7T

P→C 1 w w w w w 2

C→M 1 w w w w 2

Cache Statistical Indicators

Memory System


Cache Memories

Cache Organization


Hit: the address to be acceded is already in cache, whichimmediately provides the corresponding data.

th: cache access time, upon a hit

ph: fraction of successful cache accesses (hit rate)

Miss: the address to be acceded is not in cache; thecorresponding data must be read from primary memory.

tm: cache access time, upon a miss

pm: fraction of faulty cache accesses (miss rate),pm = 1− ph

tp: miss penalty, tp = tm − th

Computation of Mean Access Times

Memory System


Cache Memories

Cache Organization


Mean Access Time:

taccess = ph × th + pm × tm

= th + pm × tp

Computation of Mean Access Times

Memory System


Cache Memories

Cache Organization


Mean Access Time:

taccess = ph × th + pm × tm

= th + pm × tp

Alternative measure:

Mean Number of Misses per Instruction: miss rate times themean memory accesses per instruction.

Cache Organization

Memory System


Cache Memories

Cache Organization

Next Class


Cache Organization

Memory System


Cache Memories

Cache Organization

Next Class


Full Associative Cache: each address can be stored in anycache position.

Tags Data

Decoder

Tag Offset

=Data

Hit

Address

Associative memory: expensive and slow!!! Only used in verysmall caches.

Cache Organization

Memory System


Cache Memories

Cache Organization

Next Class


Direct Mapped Cache: each address can only be stored in aspecific cache position.

Tags Data

Decoder

Dec

oder

Tag Index Offset

=Data

Hit

Address

Simpler cache, but with a greater number of conflicts.

Cache Organization

Memory System


Cache Memories

Cache Organization

Next Class


n-way set associative cache: each address can be stored inone of n possible associative sets of the cache.

Tags Data Tags Data

Decoder

Dec

oder

Decoder

Tag Index Offset

==

MultiplexerDecision Logic

Data

Hit

Address

Intermediate solution.

Cache Block

Memory System


Cache Memories

Cache Organization

Next Class


� Temporal Locality:

Cache Block

Memory System


Cache Memories

Cache Organization

Next Class



◮ Keep in cache the latest acceded addreses;

Cache Block

Memory System


Cache Memories

Cache Organization

Next Class




� But... how can we take profit of spatial locality?

Cache Block

Memory System


Cache Memories

Cache Organization

Next Class





◮ Also load a set of memory positions, contiguous to theacceded address, into the cache;

Cache Block

Memory System


Cache Memories

Cache Organization

Next Class





◮ Also load a set of memory positions, contiguous to theacceded address, into the cache;

◮ Instead of a single memory position, each cache rowcorresponds to a whole block of memory positions.

Cache Organization

Memory System


Cache Memories

Cache Organization

Next Class


Selection of the desired word, within the cache block.

Tags Data

Decoder

Dec

oder

Tag Index Offset

=Data

Hit

Address

Offset

Cache Management Policies

Memory System


Cache Memories

Cache Organization

Next Class


� Cache Management Policies - provide answers to thefollowing questions:

◮ Where should a block be stored in cache?

◮ How to find a given block in cache?

◮ Which block should be removed from cache?

◮ What happens in a write operation?

◮ How should a block be loaded into cache?

Block Position

Memory System


Cache Memories

Cache Organization

Next Class


Where should a block be stored in cache?

� Fully Associative Cache:

Block Position

Memory System


Cache Memories

Cache Organization

Next Class




◮ The block can be stored in any cache position;

Block Position

Memory System


Cache Memories

Cache Organization

Next Class





� Direct Mapping Cache:

Block Position

Memory System


Cache Memories

Cache Organization

Next Class






◮ Each block can be store in a single and specific cacheposition, defined by the index bits;

Block Position

Memory System


Cache Memories

Cache Organization

Next Class







AddressTag Index Offset

Block Position

Memory System


Cache Memories

Cache Organization

Next Class







AddressTag Index Offset

� n-Set Associative:

◮ The block has n possible positions, one within each set,

where its position is defined by the index bits.

Block Identification

Memory System


Cache Memories

Cache Organization

Next Class


How to find a given block in cache?

Tags Data

DecoderD

ecod

er

Tag Index Offset

=Data

Hit

Address

Inde

x

Substitution Policy

Memory System


Cache Memories

Cache Organization

Next Class


Which block should be removed from cache?

Substitution Policy

Memory System


Cache Memories

Cache Organization

Next Class



⇒ LRU (Least Recently Used): remove the block that hasbeen unused for the longest time.

⇒ FIFO (First-in First-out): remove the block that wasloaded in cache for the longest time.

⇒ Random

Substitution Policy

Memory System


Cache Memories

Cache Organization

Next Class



⇒ LRU (Least Recently Used): remove the block that hasbeen unused for the longest time.

⇒ FIFO (First-in First-out): remove the block that wasloaded in cache for the longest time.

⇒ Random

Example: miss-rate variation:

2 Sets 4 Sets 8 Sets

Capacity LRU RND FIFO LRU RND FIFO LRU RND FIFO

16 kB 11.4 11.7 11.6 11.2 11.5 11.3 10.9 11.2 11.0

64 kB 10.3 10.4 10.4 10.2 10.2 10.3 10.0 10.1 10.0

256 kB 9.2 9.2 9.3 9.2 9.2 9.3 9.2 9.2 9.3

Writing Policies

Memory System


Cache Memories

Cache Organization

Next Class


What happens in a write operation?

� Execution statistics: Loads ≈ 37%, Stores ≈ 10%

◮ Writes correspond to about 21% of data accesses

� By considering instruction reads, 7% of memory accessesare writes

⇒ Optimize reads! But do not ignore the writeoperations.

� Contrary to reads, the write operation can only be startedafter we know whether we have a hit or a miss.

Writing Policies

Memory System


Cache Memories

Cache Organization

Next Class


Write Through: the write operation is accomplished both incache and in primary memory:

� Easier to implement;

� Cache and memory are always consistent;

� A read miss never causes a memory write;

� Can be optimized using a write buffer.

Write Back: the write operation is only accomplished in cache:

� Writes are completed at cache speed;

� Reduction of memory traffic.

Allocation Policy

Memory System


Cache Memories

Cache Organization

Next Class


What should be done after a write miss?

Allocation Policy

Memory System


Cache Memories

Cache Organization

Next Class



Write Allocate: the block is allocated and copied into cache

No-Write Allocate: the cache is not updated upon a writeoperation (except if it had been previously allocated there)

Allocation Policy

Memory System


Cache Memories

Cache Organization

Next Class



Write Allocate: the block is allocated and copied into cache

No-Write Allocate: the cache is not updated upon a writeoperation (except if it had been previously allocated there)

Both alternatives may be used together with any of writingpolicies. However, the following combinations are more frequent:

Write Back, Write Allocate: eventual future write operationsto the same address are done in cache.

Write Through, No-Write Allocate: the reasoning is thateven if there are subsequent writes to that block, the writesmust still go to the lower-level memory, so there is little togain in keeping it in cache...

Loading Policies

Memory System


Cache Memories

Cache Organization

Next Class


How should a block be loaded into cache?

Blocking: the requested word is only sent to the processor after thewhole block has been loaded into cache:

� Simpler to implement;

� According to spatial locality, the next access will be to thesame block.

Non Blocking:

Early Restart: fetch the words in normal order, but as soon asthe requested word of the block arrives, send it to theprocessor and let the processor continue execution;

Critical Word First: request the missed word first from memoryand send it to the processor as soon as it arrives; let theprocessor continue execution while filling the rest of thewords in the block.

� Greater impact in caches where the block loading implies severalmemory accesses.

Control Bits

Memory System


Cache Memories

Cache Organization

Next Class


� Valid: indicates if the value that is associated to a giventag is correct or not:

◮ Used, for example, to invalidate all cache positionswhen the system is (re-)initialized, or to invalidatecertain positions that were directly changed in primarymemory by other agents;

� Dirty: indicates if the value that is stored in cache is morerecent than the value that is stored in primary memory:

◮ Only used with write-back caches, to indicate that theprimary memory must be updated before such block issubstituted;

� R/W: indicates if that memory position can be written orwhether it is read-only.

Control Bits

Memory System


Cache Memories

Cache Organization

Next Class







Control Bits

Memory System


Cache Memories

Cache Organization

Next Class







Control Bits

Memory System


Cache Memories

Cache Organization

Next Class


� LRU: indicates that such associative set was the last oneto be acceded:

◮ Used in 2-way set associative caches to implement anLRU substitution policy

� Acceded: indicates that this associative set was accededsince the last time that this bit was reset to zero:

◮ Approximation of an LRU policy, by reseting it to zeroin regular time intervals and asserting it to 1 whensuch associative set is acceded.

Control Bits

Memory System


Cache Memories

Cache Organization

Next Class


� LRU: indicates that such associative set was the last oneto be acceded:

◮ Used in 2-way set associative caches to implement anLRU substitution policy

� Acceded: indicates that this associative set was accededsince the last time that this bit was reset to zero:

◮ Approximation of an LRU policy, by reseting it to zeroin regular time intervals and asserting it to 1 whensuch associative set is acceded.

Next Class

Memory System


Cache Memories

Cache Organization

Next Class


Next Class

Memory System


Cache Memories

Cache Organization

Next Class


� Miss Penalty Reduction:

◮ Multi-level caches;

◮ Greater priority to reads than to writes;

◮ Victim caches;

� Miss Rate Reduction:

◮ Analysis of the misses;

◮ Increase the block size;

◮ Increase the cache capacity;

◮ Increase of the associativity level;

◮ Way prediction.

universidade tecnica de lisboa · universidade tecnica de lisboa instituto superior tecnico...

Documents