chapter 7: memory system design
DESCRIPTION
Chapter 7: Memory System Design. Topics Introduction: The Components of the Memory System The CPU - Memory Interface Memory Structure Memory Boards and Modules Two-Level Memory Hierarchy The Cache Virtual Memory. Components of a Computer System. Memory System. Input/Output System. - PowerPoint PPT PresentationTRANSCRIPT
7-1 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Chapter 7: Memory System Design
Topics
Introduction: The Components of the Memory System
The CPU - Memory Interface
Memory Structure
Memory Boards and Modules
Two-Level Memory Hierarchy
The Cache
Virtual Memory
7-2 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Components of a Computer System
CPU MemorySystem
Input/OutputSystem
7-3 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Memory System
• The memory system houses the information that is processed by the computer system:
• Data - variables, images, documents, etc.
• Programs - operating system, applications
• The memory system has many levels and categories:
• CPU registers, cache, main memory, disk, tape
• Volatile, Non-volatile
• Speed
• Access pattern
• Capacity
7-4 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Tbl 7.3 The Memory Hierarchy, Cost, and Performance
Some Typical Values:
Component
Access pattern Random Random Random Direct Sequentialaccess access access access access
Capacity, bytes 64–1024 8–1024KB 8–64 MB 1–10 GB 1 TB
Latency 1–10 ns 20 ns 50 ns 10 ms 10 ms–10 s
Block size 1 word 16 words 16 words 4 KB 4 KB
Bandwidth System 8 MB/s 1 MB/s 1 MB/s 1 MB/sclock
rate
Cost/MB High $500 $30 $0.25 $0.02
CPUCache Main Memory Disk Memory
TapeMemory
7-5 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
The Main Memory System
• The system’s disk space and tape backup facility are part of the memory system
• Information storage (programs, documents, etc.)
• Increasing the amount of storage space available to programs (virtual memory)
• However, for this discussion, we will concentrate on the construction and operation of the physical memory in the system (cache, main memory)
• Interfacing memory to the CPU
• Memory organization
• Implementation
7-6 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Components of the Main Memory System
• The main memory system typically consists of two types of memory
• ROM - read only memory - used to hold the non-volatile part of the system information
• RAM - random access memory - used to hold the system information that can be lost when the system powers down
• The term RAM for the volatile portion of the memory is somewhat a misnomer as the ROM memory is also random access, but that is the term that has become standard
• Every system must contain some amount of non-volatile ROM to tell the CPU what to do when it first powers up
• ROM contains the entire program that the CPU executes - RAM is used only for temporary data storage (dedicated microcontrollers)
• ROM holds a small program that loads the main program (an application or operating system) into RAM and executes it - called a “boot ROM” or “boot strap” program (complex embedded systems, desktop systems mainframes)
7-7 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Tbl 7.1 Some Memory Properties
Symbol Definition Intel Intel PowerPC8088 8086 601
w CPU word size 16 bits 16 bits 64 bits
m Bits in a logical memory address 20 bits 20 bits 32 bits
s Bits in smallest addressable unit 8 bits 8 bits 8 bits
b Data bus size 8 bits 16 bits 64 bits
2m Memory word capacity, s-sized wds 220 words 220 words 232 words
2mxs Memory bit capacity 220 x 8 bits 220 x 8 bits 232 x 8 bits
7-8 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Big-Endian and Little-Endian Storage
AB CDmsb ... lsb
ABCD0
1
ABCD
0
1
Little-Endian Big-Endian
• When data types having a word size larger than the smallest addressable unit are stored in memory (eg. a 32 bit floating point number stored in a memory addressed in 16 bit words) there are two ways to do it:
• Store the least significant part of the word at the lowest address (little-Endian, little end first)
• Store the most significant part of the word at the lowest address (big-Endian, big end first)
• Example: The hexadecimal 16-bit number 0xABCD, stored at address 0 in a byte addressable memory:
7-9 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Tbl 7.2 Memory Performance Parameters
Symbol Definition Units Meaning
ta Access time time Time to access a memory word
tc Cycle time time Time from start of access to start of next
access
k Block size words Number of words per block
Bandwidth words/time Word transmission rate
tl Latency time Time to access first word of a sequence
of words
tbl = Block time Time to access an entire block of words
tl + k/ access time
(Information is often stored and moved in blocks at the cache and disk level.)
7-10 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.1 The CPU–Memory Interface
Sequence of events:Read:
1. CPU loads MAR, issues Read, and REQUEST2. Main memory transmits words to MDR3. Main memory asserts COMPLETE
Write:1. CPU loads MAR and MDR, asserts Write, and REQUEST2. Value in MDR is written into address specified by MAR3. Main memory asserts COMPLETE
–more–
CPU
m
Main memory
Address busData bus
s Address
0
1
2
3
2m – 1
A0 – Am –1
D0 – Db –1
R/W
REQUEST
COMPLETE
MDR
Registerfile
Control signals
m
w
w
MAR
b
7-11 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.1 The CPU–Memory Interface (cont’d.)
Additional points:• If b < w, main memory must make w/b b-bit transfers• Some CPUs allow reading and writing of word sizes < w
Example: Intel 8088: m = 20, w = 16, s = b = 88- and 16-bit values can be read and written
• If memory is sufficiently fast, or if its response is predictable,then COMPLETE may be omitted
• Some systems use separate R and W lines, and omit REQUEST
CPU
m
Main memory
Address busData bus
s Address
0
1
2
3
2m – 1
A0 – AmÐ1
D0 – DbÐ1
R/W
REQUEST
COMPLETE
MDR
Registerfile
Control signals
m
w
w
MAR
b
7-12 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Specific Example - the SRC Memory Interface
SRC CPUMemory
Subsystem
R
W
ADDR
Done
MEMStrobe
DATA
• Memory Read Operation• CPU places address on ADDR bus and asserts Read line
• Memory places valid data on the DATA bus and asserts Done and MEMStrobe
• Memory Write Operation• CPU places address on ADDR bus, data on the DATA bus, and
asserts Write line
• Memory stores data and asserts Done
7-13 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
SRC Memory Read Bus Cycle
System Clock
R
W
ADDR
DATA
Done
MEMStrobe
Valid data frommemory to CPU
Valid address from CPU to memory
Start of memoryread operation
End of memoryread operation
1 to n clock cycles
7-14 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
SRC Memory Write Bus Cycle
System Clock
R
W
ADDR
DATA
Done
MEMStrobe
Valid address from CPU to memory
Start of memorywrite operation
End of memorywrite operation
1 to n clock cycles
Valid data from CPU to memory
7-15 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Wait States
• Memory operations can take from 1 to n clock cycles
• If the memory is fast enough to perform the operation in a single clock cycle, the COMPLETE signal (Done and MEMStrobe in the SRC) can be asserted in the same clock cycle
• If the memory can not complete the request in a single clock cycle, it simply continues working on it and delays sending the COMPLETE signal.
• If the memory operation takes n>1 clock cycles to complete, the memory is said to have inserted n-1 “wait states” into the operation
• n=1 is a zero wait state access- this is usually not possible given the increasing speed of processors vs. the speed of memory
7-16 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
SRC Wait State ExamplesZero Wait States (n=1)
One Wait State (n=2)
7-17 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
More SRC Wait State ExamplesThree Wait States (n=4)
Many Wait States
7-18 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Addressing Memory - Decoding the Chip Selects
• The main memory system is typically made up of “blocks” of memory implemented as physically separate units
• Each block must be activated based on its portion of the address space of the CPU - this is accomplished using individual Chip Selects (CS) to activate each block
• Activation of the CS is accomplished by decoding the address from the CPU - one of the functions of the memory controller
SRC CPU
Memory Subsystem
RW
ADDR
DoneMEMStrobe
DATA
ROM
RAM1
RAM2
MemoryController
CS
CS
CS
7-19 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Building an Address Decoder
• Determine how you want the various ROM and RAM memory modules arranged in the CPU’s address space
• Calculate the starting and ending address for each block of memory
• Given a starting address of s and a memory block size of k, the ending address, e = s + (k-1)
• If the ending address of a previous memory block is e, the starting address of the next block, s = e+1
• Calculate the number of lower address bits that must be connected to the block to address the memory within it
• Determine the unique pattern on the upper address bits that will signify activity on that block
• Construct logic that will activate the CS for that block given the above pattern
7-20 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Building an Address Decoder - An Example
• Consider a processor with a 32 bit address bus which uses word addressing connected to the following memory:
• 32K word EEPROM starting at address 0x00000000
• 16K word RAM module above the EEPROM
• a second 16KRAM module above the first
32K ROMblock
16K RAMblock
16K RAMblock
Addresses
Memory Map
0x00000000
0x00007FFF
0x00008000
0x0000BFFF0x0000C000
0x0000FFFF
32K = 215 - 15 bits to address memory within this block
16K = 214 - 14 bits to address memory within this block
16K = 214 - 14 bits to address memory within this block
7-21 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Address Decoding Example (cont.)• First block of ROM uses bits (14:0) to address it
• Starts at address 0x00000000 and goes to 0X00007FFF
• Selected when all bits above (14) are ‘0’
• First block of RAM uses bits (13:0) to address it
• Starts at address 0x00008000 and goes to 0x0000BFFF
• Selected when bits (32:16) are ‘0’, and bits (15:14) are “10”
• Second block of RAM uses bits (13:0) to address it
• Starts at address 0x0000C000 and goes to 0x0000FFFF
• Selected when bits (32:16) are ‘0’, and bits (15:14) are “11”
ADDR(31:15) CS_ROM1
ADDR(14:0) to ROM1’s address lines
ADDR(31:16)
CS_RAM1ADDR(14)ADDR(15)
ADDR(13:0) to RAM1’s address lines
ADDR(31:16)
CS_RAM2ADDR(14)ADDR(15)
ADDR(13:0) to RAM2’s address lines
7-22 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Another Address Decoding Example
• Decoding is more complex when a larger block sits above a smaller one
• Consider a processor with a 16 bit address bus which uses word addressing connected to the following memory:
• 2K word EEPROM starting at address 0x00000000
• 16K word RAM module above the EEPROM
2K ROMblock
16K RAMblock
Addresses
Memory Map
0x0000
0x07FF0x0800
0x47FF
2K = 211 - 11 bits to address memory within this block
16K = 214 - 14 bits to address memory within this block
7-23 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Another Decoding Example (cont.)• Selecting first ROM is simple
• Selected when all bits above ADDR(10) are ‘0’
• Selecting (and addressing within) RAM is more complex• Starting address of RAM is 0x0800 - 0b 0000 1000 0000 0000
• However, ADDR(13:0) are used to address within the RAM and must be ‘0’ to address the first RAM location
• Now we have to subtract 0b 1000 0000 0000 from ADDR(13:0) value to get proper address within RAM - too complex
• Solution - adjust memory map so that all blocks are “aligned” on size of largest block
2K ROMblock
16K RAMblock
Addresses
Memory Map
0x0000
0x07FF
0x4000
0x7FFF
2K = 211 - 11 bits to address memory within this block
16K = 214 - 14 bits to address memory within this block
14K blockunused
7-24 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Another Decoding Example (cont.)
• First block of ROM uses bits (10:0) to address it
• Starts at address 0x0000 and goes to 0X07FF
• Selected when all bits above (10) are ‘0’
• First block of RAM uses bits (13:0) to address it
• Starts at address 0x4000 and goes to 0x7FFF
• Selected when bit (15) is ‘0’, and bit (14) is ‘1’
ADDR(15:11) CS_ROM1
ADDR(10:0) to ROM1’s address lines
CS_RAM1ADDR(14)ADDR(15)
ADDR(13:0) to RAM1’s address lines
7-25 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Additional Functions of the Memory Controller
• As just discussed, the memory controller must decode the addresses from the CPU and select the proper memory block
• In addition, the memory controller must provide the proper handshaking between the CPU and the memory
• Most memory chips are asynchronous (don’t work on a clock) and do not have any outputs which tell when the data is valid - you simply have to apply the correct inputs and wait the proper amount of time
• Some memory chips require a sequence of signals to be applied for different operations
• Each CPU type had a different set of control signals and a different bus cycle
• Finally, some memory controllers are required to perform complex address calculations to translate addresses from the CPU into a different format that the memory system requires
7-26 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.3 Conceptual Structure of a Memory Cell
Select
DataIn
DataOut
R/W
Select
DataOutDataIn
R/W
Regardless of the technology, all RAM memory cells must providethese four functions: Select, DataIn, DataOut, and R/W.
This “static” RAM cell is unrealistic in practice, but it is functionally correct.We will discuss more practical designs later.
7-27 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.4 An 8-Bit Register as a1-D RAM Array
Data bus is bidirectional and buffered
The entire register is selected with one select line, and uses one R/W line
Select
DataIn DataOut
R/W
d0
Select
R/W
d1 d2 d3 d4 d5 d6 d7
D
D D D D D D D D
7-28 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.5 A 4 x 8 2-D Memory Cell Array
R/W is commonto all
2-bitaddress
Bidirectional 8-bit buffered data bus
2-4 line decoder selects one of the four 8-bit arrays
d0
R/W
d1 d2 d3 d4 d5 d6 d7
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
2– 4 decoder
A1
A0
• This type of N to 2N decoder requires 2N
gates, which is practical for only very small arrays
7-29 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.6 A 64 K x 1 Static RAM Chip~square array fits IC designparadigm
Selecting rows separatelyfrom columns means only256 x 2 = 512 circuit elementsinstead of 65536 circuitelements!
CS, Chip Select, allows chips in arrays to be selected individually
This chip requires 21 pins including power and ground, and sowill fit in a 22-pin package.
256
256
1 256–1 mux1 1–256 demux
1
8
8
Row address:A0–A7
8–256row
decoder
256 256cell array
Column address:A8–A15
R/W
CS
7-30 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
This chip requires 24 pins including power and ground, and so will require a 24-pin package. Package size and pin count can dominate chip cost.
Fig 7.7 A 16 K x 4 SRAM Chip
There is little difference between this chip and the previous one, except that there are4 64-1 multiplexers instead of1 256-1 multiplexer.
256
64 each
4 64– 1 muxes4 1–64 demuxes
4
8
6
Row address:A0–A7
8–256row
decoder
4 64 256cell arrays
Column address:A8–A13
R/W
CS
7-31 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.8 Matrix and Tree Decoders• 2-level decoders are limited in size because of gate fan-in.
Most technologies limit fan-in to ~8.
• When decoders must be built with fan-in >8, then additional levelsof gates are required.
• Tree and matrix decoders are two ways to design decoders with large fan-in:
3-to-8 line tree decoder constructedfrom 2-input gates.
4-to-16 line matrix decoderconstructed from 2-input gates.
m0
m1
m2
m3
m4
m5
m6
m7
x2 x3
m8
m9
m10
m11
m12
m13
m14
m15
x0
x1
2 –4
decoder
2–4decoder
x0
m1
m0
m5
m4
m3
x2x2
m2
m7
m6x1
2–4
decod
er
7-32 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.9Six-Transistor
Static RAM Cell
This is a more practicaldesign than the 8-gatedesign shown earlier.
A value is read byprecharging the bitlines to a value 1/2way between a 0 anda 1, while asserting theword line. This allows thelatch to drive the bit linesto the value stored inthe latch.
bi bi
R/W
Columnselect
(from columnaddressdecoder)
CS
di
Sense/write amplifiers —sense and amplify dataon Read, drive bi and
bi on write
Additional cells
Switches to control access
to cell
+5Activeloads
Word line wi
Storagecell
Dual rail data lines for reading and writing
7-33 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.10 Static RAM Read Operation
Access time from Address—the time required of the RAM array to decode the address and provide value to the data bus.
Memoryaddress
Read/write
CS
Data
tAA
7-34 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.11 Static RAM Write Operations
Write time—the time the data must be held valid in order to decode address and store value in memory cells.
Memoryaddress
Read/write
CS
Data
tw
Data is latched on this rising edge
7-35 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.12 Dynamic RAM Cell
Organization
Write: place value on bit lineand assert word line.Read: precharge bit line,assert word line, sense valueon bit line with sense/amp.
Capacitor willdischarge in 4–15 ms.
Refresh capacitor by reading (sensing) value on bit line, amplifying it, and placing it back on bit line where itrecharges capacitor.
This need to refresh the storage cells of dynamic RAM chips complicatesDRAM system design.
bi
R/W R
W
Columnselect
(from columnaddressdecoder)
CS
Sense/write amplifiers —sense and amplify dataon Read, drive bi and bi
on write
Additional cells
tc
Capacitor stores charge for a 1, no charge fora 0
Word line wj
di
Switch to controlaccess to cellSingle bit line
7-36 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.13 Dynamic RAM
Chip Organization
• Addresses are time-multiplexed on address bus using RAS and CAS as strobes of rows and columns.
• CAS is normally used as the CS function.
Notice pin counts:• Without address
multiplexing: 27 pins including power and ground.
• With address multiplexing: 17 pins including power and ground.
1024
10
1024
10
A0–A9
RAS
CAS
R/W
Control
1024
1024 sense/write amplifiersand column latches
1024 1024 cell array
10 column address latches,1–1024 muxes and demuxes
di
do
Ro
w la
t ch
es a
nd d
ecod
er
Controllogic
7-37 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Figs 7.14, 7.15DRAM Read and Write Cycles
Typical DRAM Read operation Typical DRAM Write operation
Access time Cycle timeNotice that it is the bit line prechargeoperation that causes the differencebetween access time and cycle time.
Data hold from RAS.
Memoryaddress
RAS
Column addressRow address
tRAS tprechg
CAS
W
Data
tDHR
tC
Memoryaddress
RAS
Column addressRow address
tRAS tPrechg
CAS
R/W
Data
tA
tC
7-38 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
DRAM Refresh and Row Access• Refresh is usually accomplished by a “RAS-only” cycle. The row address
is placed on the address lines and RAS asserted. This refreshes the entire row. CAS is not asserted. The absence of a CAS phase signals the chip that a row refresh is requested, and thus no data is placed on the external data lines.
• Many chips use “CAS before RAS” to signal a refresh. The chip has an internal counter, and whenever CAS is asserted before RAS, it is a signal to refresh the row pointed to by the counter, and to increment the counter.
• Most DRAM vendors also supply one-chip DRAM controllers that encapsulate the refresh and other functions.
• Page mode, nibble mode, and static column mode allow rapid access tothe entire row that has been read into the column latches.
• Video RAMS, VRAMS, clock an entire row into a shift register where it canbe rapidly read out, bit by bit, for display.
7-39 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
ROM (Non-Volatile) Memory
1 0 1 0
00
+v
Rowdecoder
CS
Address
Fig 7.16 A 2-D CMOS ROM Chip
7-40 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Tbl 7.4 ROM Types
ROM Cost Programmability Time to Time to EraseType Program
Mask- Very At factory Weeks N/Aprogrammed inexpensive onlyROM
PROM Inexpensive Once, by Seconds N/A end user
EPROM Moderate Many times Seconds 20 minutes
Flash Expensive Many times 100 s 1 s, largeEPROM block
EEPROM Very Many times 100 s 10 ms,expensive byte
7-41 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Memory Boards and Modules
• There is a need for memories that are larger and wider than a single chip• Chips can be organized into “boards.”
• Boards may not be actual, physical boards, but may consist ofstructured chip arrays present on the motherboard.
• A board or collection of boards make up a memory module.
• Memory modules:• Satisfy the processor–main memory interface requirements• May have DRAM refresh capability• May expand the total main memory capacity• May be interleaved to provide faster access to blocks of words
7-42 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
This is a slightly different view of the memory chip than previous.
Multiple chip selects ease the assembly ofchips into chip arrays. Usually providedby an external AND gate.
Fig 7.17 General Structureof a Memory Chip
Rowdecoder
Memorycell
array
I/Omultiplexer
Address
CS CSR/W
Address
Data
R/W
s s
m
s
s
Chip selects
m
s
. . .
. . .
. . .
Data
7-43 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.18 Word Assembly from Narrow Chips
P chips expand word size from s bits to p x s bits.
All chips have common CS, R/W, and Address lines.
Select
Address
R/W
CSR/W
Address
Data
CSR/W
Address
Data
CSR/W
Address
Data
s s
p s
s
. . .
7-44 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.19 Increasing the Number of Words by a Factor of 2k
The additional k address bits are used to select one of 2k chips,each one of which has 2m words:
Word size remains at s bits.
Address
R/W
CSR/W
Address
Data
CSR/W
Address
Data
CSR/W
Address
Data
s
m
m+k
k
s
s
s
k to 2k
decoder
. . .
7-45 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.20 Chip
Matrix Using Two
Chip Selects
Multiple chip select linesare used to replace thelast level of gates in thismatrix decoder scheme.
This schemesimplifies thedecoding fromuse of a (q+k)-bitdecoderto using oneq-bit and onek-bit decoder.
Address
R/W
m
q
m + q + k
k
s
One of 2m+q+k
s-bit words
Horizontaldecoder
Ver
tical
dec
oder
CS1 CS2
R/WAddress
Data
7-46 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.21Three-
DimensionalDynamic
RAM Array
• CAS is used to enable top decoder in decoder tree.
• Use one 2-D array for each bit. Each 2-D array on separate board.
w
2k r d
ecod
e r
2kc decoder
CAS
Highaddress
Multiplexedaddress m/2
kc + kr
kc
kr
Enable
2k r d
e cod
er
RAS CAS
R/WAddress
Data
Data
R/W
RAS
RAS CAS
R/WAddress
Data
RAS CAS
R/WAddress
Data
. . .
. . .
. . .
7-47 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.22 A Memory Module and Its Interface
Must provide—• Read and Write signals.• Ready: memory is ready to accept commands.• Address—to be sent with Read/Write command.• Data—sent with Write or available upon Read when Ready is asserted.• Module select—needed when there is more than one module.
Control signal generator:for SRAM, just strobesdata on Read, ProvidesReady on Read/Write
For DRAM—also providesCAS, RAS, R/W, multiplexesaddress, generates refreshsignals, and provides Ready.
Bus Interface:
m
w
Data register
Memory boardsand/orchips
Chip/boardselection
Address register
Controlsignal
generator
Moduleselect
Address
k + m
k
Data
Ready
Write
Read
w
7-48 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.23 Dynamic RAM Modulewith Refresh Control
Moduleselect
Address
k+m
Read
Write
Ready
Dataw
w
m/2
m/2m/2
k m/2
2
Refreshclock and
control
Re
ques
t
Ref
res h
Gra
nt
Memorytiming
generator
Refresh counter
Chip/boardselection
Address register
Addressmultiplexer
Board andchip selects
Data lines
Address lines
DynamicRAMarray
RAS
CAS
R/W
Data register
7-49 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Memory System Performance
For all accesses:• transmission of address to memory• transmission of control information to memory (R/W, Request, etc.)• decoding of address by memory
For a Read:• return of data from memory• transmission of completion signal
For a Write:• transmission of data to memory (usually simultaneous with address)• storage of data into memory cells• transmission of completion signal
Breaking the memory access process into steps:
The next slide shows the access process in more detail.
7-50 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.26 Sequence of Steps in Accessing Memory
“Hidden refresh” cycle. A normal cycle would exclude thepending refresh step. -more-
Command to memory
Address to memory
Write data to memory
(a) Static RAM behavior
(b) Dynamic RAM behavior
Write data
Return data
Address decode Complete
Row address & RAS Precharge
Precharge
Read or write
Read or write
Read or write
Read or write
Read or writeWrite
Write
Write
Pending refresh
Read
Read
ta
tc
tc
ta
Column address & CASComplete
Return data
Write data to memory
Refresh Precharge
R/W
Complete
7-51 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Example SRAM TimingsApproximate values for static RAM Read timing:
• Address bus drivers turn-on time: 40 ns.• Bus propagation and bus skew: 10 ns.• Board select decode time: 20 ns.• Time to propagate select to another board: 30 ns.• Chip select: 20 ns.
PROPAGATION TIME FOR ADDRESS AND COMMANDTO REACH CHIP: 120 ns.
• On-chip memory read access time: 80 ns.• Delay from chip to memory board data bus: 30 ns.• Bus driver and propagation delay (as before): 50 ns.
TOTAL MEMORY READ ACCESS TIME: 280 ns.
Moral: 70 ns chips do not necessarily provide 70 ns access time!
7-52 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Cache Memory• Caches exploit two properties of program behavior:
• Spatial locality: the property that if a given memory location is referenced, those locations physically near it are likely to be referenced “soon.”
• Temporal locality: the property that if a given memory location is referenced, it is likely to be referenced again, “soon.”
• The solution is to place a small amount of very fast memory “close” to the CPU and use the above rules to keep copies of memory contents that are likely to be used in the future in there
CPU • • •
Large, slower(10 to 100 clock cycles)
MainMemory
Cache
Small, fast(1 to 2 clock cycles)
Disk, etc.
Very large, very slow(> 1000’s of clock cycles)
Usually on the same chip
7-53 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Cache Memory
• Caches exploit spatial and temporal locality by performing two functions:
• When a given word is requested from memory by the CPU, the cache fetches not only that word from memory, but also other words near it (spatial locality)
• The total group of memory words fetched by the cache is called a cache “block”
• The cache stores words (and their associated blocks) that have been recently requested by the CPU in case they are needed again (temporal locality)
• When a new block is requested, if the cache is full, it must decide which old block must be removed (placed back in main memory) to make room for the new block
7-54 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
CPUCache
Block
Main memory
Mapping functionAddress
Word
Fig 7.30 The Cache Mapping Function
The cache mapping function is responsible for all cache operations:• Placement strategy: where to place an incoming block in the cache• Replacement strategy: which block to replace upon a miss• Read and write policy: how to handle reads and writes upon cache misses
Mapping function must be implemented in hardware
Three different types of mapping functions:• Associative• Direct mapped• Block-set associative
Example: 256 KB 16 words 32 MB
7-55 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Cache Performance - Hits and Misses;
Hit: the word requested by the CPU was found in the cache
Miss: the requested by the CPU word was not found in the cache(A miss will result in a request for the block containing the word frommain memory
Hit ratio (or hit rate) = h = number of hits
Miss ratio: 1 - hit ratio
tc = cache memory access time. tm = main memory access time
Access time, ta = h • tc + (1-h) • tm
total number of references
Example:
cache access time=2 clock cycles, main memory access time = 40 clock cycles,hit ratio = 95%
ta = .95 * 2 + (1.0 - .95) * 40 = 3.9 clock cyclesFaster than main memory alone by a factor of 10!
7-56 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Memory Fields and Address Translation for Caches
Example of processor-issued 32-bit address:031
32 bits
• That same 32-bit address can be partitioned into two fields, a block field, and a word field
• The word field represents of the word within the block specified in the block field
• If the block size is N words, the word field is log2N bits• The block number servers to uniquely identify the block within the
cache (called the tag)
Block number Word
26 6
226 64 word blocks
00 ••• 001001 001011
Example of a specific memory reference: Block 9, word 11.
7-57 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
*16 bits, while unrealistically small, simplifies the examples
Associative mapped cache model: any block from main memory can be put anywhere in the cache.Assume a 16-bit main memory.*
Fig 7.31 Associative Cache
Cachememory
Mainmemory
Validbits
0
1
2
255
421
?
119
2
Cache block 0 MM block 0
MM block 1
MM block 2
MM block 119
MM block 421
MM block 8191
?
Cache block 2
Cache block 255
1
0
1
313
1
Tag memory
Tag field,
13 bits
Tag
Main memory address:
Byte
One cache line,8 bytes
One cache line,8 bytes
Valid,1 bit
7-58 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.32 Associative Cache MechanismBecause any block can reside anywhere in the cache, an associative (content addressable) memory is used. All locations are searched simultaneously.
Matchbit
Validbit
Match
64
3
8To CPU
Argumentregister
Associative tag memory
313 Selector
TagMain memory address
Byte
Cache block 0
?
Cache block 2
Cache block 255
One cache line,8 bytes
2
3
1
4
5
6
7-59 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Advantages and Disadvantagesof the Associative Mapped Cache
Advantage
• Most flexible of all—any MM block can go anywhere in the cache.
Disadvantages
• Large tag memory.
• Need to search entire tag memory simultaneously means lots ofhardware.
Replacement Policy is an issue when the cache is full. –more later–
Q.: How is an associative search conducted at the logic gate level?
Direct-mapped caches simplify the hardware by allowing each MM blockto go into only one place in the cache:
7-60 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.33 Direct-Mapped Cache
Key Idea: all the MM blocks from a given group can go into only one location in the cache, corresponding to the group number.
Now the cache needs only examine the single groupthat its reference specifies.
Cachememory Main memory block numbers Group #:
Validbits
0
1
2
255
30
9
1
1
1
1
1
38
38
1
Tag memory
Tag field,5 bits
Group
5
Tag
Cache address:
Main memory address:
Byte
One cacheline,
8 bytes One cache line,8 bytes
0
1
2
255
256
257
258
511
512
513
514
767
2305
7680
7681
7682
7936
7937
7938
0
1
2
2558191
0Tag #: 1 2 9 30 31
7-61 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.34 Direct-Mapped Cache Operation 1. Decode the
group number of the incoming MM address to select the group
2. If MatchAND Valid
3. Then gate out the tag field
4. Compare cache tag with incoming tag
5. If a hit, then gate out the cache line
6. and use the word field toselect the desired word.
Cachememory
Validbits
0
1
2
255
30
9
1
1
1
5
5 5
8
64
3
1
256
1
1
38
1
Tag memory
Tag field,5 bits
Group
5
Tag
Cache missCache hit=
Main memory addressByte
Hit
5-bitcomparator Selector
8–256decoder
4
3
1
5
2
6
7-62 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Direct-Mapped Caches
• The direct mapped cache uses less hardware, but is much more restrictive in block placement.
• If two blocks from the same group are frequently referenced, then the cache will “thrash.” That is, repeatedly bring the two competing blocks into and out of the cache. This will cause a performance degradation.
• Block replacement strategy is trivial.
• Compromise—allow several cache blocks in each
group—the Block-Set-Associative Cache:
7-63 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.35 2-Way Set-Associative CacheExample shows 256 groups, a set of two per group.Sometimes referred to as a 2-way set-associative cache.
Cachememory Main memory block numbers Group #:
7680
2304
258
0
1
2
255
2
2
0
38
38
Tag memory
Tag field,5 bits
Set
5
Tag
Cache group address:
Main memory address:
Byte
One cacheline,
8 bytes One cache line,8 bytes
512
513
255
0
1
2
255
256
257
258
511
512
513
514
767
2304
7680
7681
7682
7936
7937
7938
0
1
2
2558191
0Tag #: 1 2 9 30 31
30
9
1
1 511
7-64 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Getting Specific:The Intel Pentium Cache
• The Pentium actually has two separate caches—one for instructions and one for data. Pentium issues 32-bit MM addresses.
• Each cache is 2-way set-associative• Each cache is 8 K = 213 bytes in size• 32 = 25 bytes per line.• Thus there are 64 or 26 bytes per set, and therefore
213/26 = 27 = 128 groups• This leaves 32 - 5 - 7 = 20 bits for the tag field:
20 7 5
Tag Set (group) Word
31 0
7-65 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Cache Read and Write Policies• Read and Write cache hit policies
• Writethrough—updates both cache and MM upon each write.• Write back—updates only cache. Updates MM only upon block
removal.• “Dirty bit” is set upon first write to indicate block must be
written back.
• Read and Write cache miss policies• Read miss—bring block in from MM
• Either forward desired word as it is brought in, or• Wait until entire line is filled, then repeat the cache request.
• Write miss• Write-allocate—bring block into cache, then update• Write–no-allocate—write word to MM without bringing block into
cache.
7-66 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Block Replacement Strategies
• Not needed with direct-mapped cache
• Least Recently Used (LRU)• Track usage with a counter. Each time a block is accessed:
• Clear counter of accessed block• Increment counters with values less than the one accessed• All others remain unchanged
• When set is full, remove line with highest count
• Random replacement—replace block at random• Even random replacement is a fairly effective strategy
7-67 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
• The PPC 601 has a unified cache—that is, a single cache for both instructions and data.
• It is 32 KB in size, organized as 64 x 8 block-set associative, with blocks being 8 8-byte words organized as 2 independent 4-word sectors for convenience in the updating process
• A cache line can be updated in two single-cycle operations of 4 words each.• Normal operation is write back, but write through can be selected on a per
line basis via software. The cache can also be disabled via software.
Fig 7.36 The PowerPC 601 Cache Structure
8 words 8 words
64 bytes
8 words 8 words
64 bytes
8 words 8 words
64 bytes
8 words 8 words
64 bytes
8 words 8 words
64 bytes
66
Tag memory
Cachememory
Address tag
Line 63
64 sets
Line 0
Setof 8
Line (set) #
20
Tag
Physical address:
Word #
Sector 0 Sector 1
8 words 8 words
64 bytes20 bits
7-68 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Virtual Memory
• Virtual memory is a memory organization in which the processor issues all memory references as effective addresses in a flat address space which does not directly correspond to an address in physical memory
• Virtual addresses issued by the processor must be translated into physical addresses
• These physical addresses may correspond to an address in main memory, OR to a section of memory that is stored on disk
• Virtual memory has two major benefits:• Allows the computer system to use a portion of its disk space as an
extension to main memory, thus allowing larger programs to be executed
• Facilitates Multiprogramming - where the computer services more than one user simultaneously
• Allows each user to have the same amount of memory space in which to run programs
• Allows each user to have a logically separate memory space for protection
7-69 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Virtual Memory - How it works
CPUMain
(physical)Memory
SystemDisk
Space
Individual programs’ virtual address spaces (large and flat)
Portion of the active program currently being accessed by the CPU
Operating System (eg. Unix, Windows98, etc.) is responsible for keeping the information currently being executed by the CPU in main memory Portions of programs that are not
currently being accessed are stored temporarily on disk as needed
7-70 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Virtual Memory - Terms
• Portions of program are transfered between main memory and disk in blocks just as they are between main memory and cache - here they are called Pages and the transfer process is called Paging
• When the CPU wants to access a portion of the program that is on disk, it must be fetched and brought into main memory - called a Page fault - synonymous with a cache miss
• Pages are moved from disk to main memory only when a word in the page is requested by the processor - called Demand Paging
• In most virtual memory systems, pages can be placed anywhere in main memory (like a fully associative cache)
• Block placement and replacement decisions must be made each time a block is moved - just as with a fully associative cache
7-71 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Virtual Memory - Implementation
CPUMain MemoryCache Disk
MMULogicalAddress
PhysicalAddressMapping
Tables
VirtualAddress
The memory management unit, MMU, is responsible for mapping logicaladdresses issued by the CPU to physical addresses that are presented tothe cache and main memory.
• Effective address—an address computed by by the processor while executing a program. Synonymous with logical address.
• The term effective address is often used when referring to activity inside the CPU. Logical address is most often used when referring to addresses when viewed from outside the CPU.
• Virtual address—an intermedate address generated from the logical address by the memory management unit, MMU - often the same as the logical address
• Physical address— the translation of the virtual address by the MMU into an address that can be presented to the memory unit.
A word about addresses:
(Note: Every address reference must be translated.)
CPU Chip
7-72 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Virtual Addressing—Advantages
• Simplified addressing. Each program unit can be compiled into its own memory space, beginning at address 0 and potentially extending far beyond the amount of physical memory present in the system.
• No address relocation required at load time.
• No need to fragment the program to accommodate Cost effective use of physical memory.
• Less expensive secondary (disk) storage can replace primary storage. (The MMU will bring portions of the program into physical memory as required)
• Access control. As each memory reference is translated, it can be simultaneously checked for read, write, and execute privileges.
• This allows access/security control at the most fundamental levels.
• Can be used to prevent buggy programs and intruders from causing damage to other users or the system.
This is the origin of those “bus error” and “segmentation fault” messages.
7-73 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
• This figure shows the mapping between virtual memory pages, physical memory pages, and pages in secondary memory. Page n - 1 is not present in physical memory, but only in secondary memory.
• The MMU manages this mapping.
Fig 7.41 Memory
Management by Paging
Programunit
0
Page 1Page 2
Page n – 1
Virtual memory
Page 0
Physical memory
Secondary memory
. .
7-74 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Page Tables• For each portion of virtual memory (each page) for a given program,
the system must keep a pointer to where it is actually located - either in main (physical) memory or on disk (secondary memory)
• The group of pointers for a program is called a page table
Physical memory
Secondary memory
Programunit
0Page 1Page 2
Page n – 1
Virtual memory
Page 0
. .
Page 2
Page 1
Page 0
Page n-1
Page Table
. .
address of Page 1
address of Page 0
address of Page 2
address of Page n-1
7-75 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.42 Virtual
Address Translation
in aPaged MMU
• 1 table per user per program unit
• One translation per memory access
• Potentially large page tableA page fault will typically result in 100,000 or more cycles passing before the page has been brought from secondary storage to MM.
Page tablelimit register
Page tablebase registerNoBounds
errorAccess-control bits:presence bit,dirty bit,usage bits
Physicalpage number or pointer tosecondarystorage
+
Offset in page table
Hit.Page in
primary memory.
Translate toDisk address.
Miss(page fault).
Page in secondarymemory.
Page table
Desired word
Main memory
Virtual address from CPU
Page number Offset in page Physical page
Physical address
Word
7-76 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Page Placementand Replacement
Page tables are direct mapped, since the physical page is computeddirectly from the virtual page number.
But physical pages can reside anywhere in physical memory.
Page tables such as those on the previous slide result in large pagetables, since there must be a page table entry for every page in theprogram unit.
Some implementations resort to hash tables instead, which need haveentries only for those pages actually present in physical memory.
Replacement strategies are generally LRU, or at least employ a “use bit”to guide replacement.
7-77 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fast Address Translation:Regaining Lost Ground
• The concept of virtual memory is very attractive, but leads to considerable overhead:
• There must be a translation for every memory reference.• There must be two memory references for every program reference:
• One to retrieve the page table entry,• one to retrieve the value.
• Most caches are addressed by physical address, so there must be a virtual to physical translation before the cache can be accessed.
The answer: a small cache in the processor that retains the last few virtual to physical translations: a Translation Lookaside Buffer, TLB.
The TLB contains not only the virtual to physical translations, but also the valid, dirty, and protection bits, so a TLB hit allows the processor to access physical memory directly.
The TLB is usually implemented as a fully associative cache:
7-78 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.43 Translation Lookaside BufferStructure and Operation
TLB
Desired word
Main memory or cache
Virtual address from CPU
Page number
Associative lookupof virtual pagenumber in TLB
TLB miss.Look for
physical pagein page table.
To page table
Virtualpagenumber
Word Physical page
Physical address
Word
Hit
N
Y
Access-control bits:presence bit,dirty bit,valid bit,usage bits
Physicalpagenumber
TLB hit.Page is in
primary memory.
7-79 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.44 Operation of the Memory Hierarchy
Virtual address
CPU Cache Main memory Secondary memory
Search TLB Search cache
Update cachefrom MM
Return valuefrom cache
Generate physicaladdress
TLBhit
Cachehit
Search page table
Update MM, cache,and page table
Page fault. Get pagefrom secondary
memory
Generate physicaladdress
Update TLB
Pagetable
hit
Y Y Y
Miss Miss Miss
7-80 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Fig 7.45 PowerPC 601 MMU Operation
“Segments” are actually more akin tolarge (256 MB) blocks.
0 Set 1UTLB
12
WordVirtual pg #Seg
#7
32-bit logical address from CPU
9
16
4
CompareCompare
16
40
12
Hit—to CPU
Miss—topage table
search
d0–d31
20
32 Cache
40
Hit
20-bit physical address
2–1 mux
24
4
0
0
127
15
24-bit virtual segment ID
Set 0
40-bit virtual page20-bitphysical
page
(VSID)Accesscontrol
andmisc.7
Miss—cache load
. .. .
7-81 Chapter 7—Memory System Design
Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan
Chapter 7 Summary• Most memory systems are multileveled—cache, main memory,
and disk.
• Static and dynamic RAM are fastest components, and their speed has the strongest effect on system performance.
• Chips are organized into boards and modules.
• Larger, slower memory is attached to faster memory in a hierarchical structure.
• The cache to main memory interface requires hardware address translation.
• Virtual memory—the main memory–disk interface—can employ software for address translation because of the slower speeds involved.
• The hierarchy must be carefully designed to ensure optimum price-performance.