1
Memory hierarchy and paging
Electronic Computers M
2
How do we dream of a memory?
• Infinite capacity and access time null……….
BUT
• Faster is the memory more expensive and power consuming is (and very often is of bigger physical size)
• The aimed characteristics are unattainable
• Alternative solution: multiple level memory hierarchy
• Big capacity memory: slow access stime
• Small capacity memory; very fast access time
• Each level is therefore characterized by :
Access time Cost per byte Total capacity Transfer speed (bandwith) Single transferred item size
Livelli di gerarchia delle memorie
3
CPURegisters
Cache I lev.
Cache II lev.
Cache III lev.
Central Memory
Disk
Tape
N.B. Some cache levels can be missing in the CPUs. How is the memory hierarchy handled?Caches are hardware managed (totally transparent to users)Memory and disks: hardware, OS and user (files)
Biggerspeed
Biggercapacity
Capacity/access-time/costs
CPU registers Hundreds of Bytes
<1 ns
CacheKbytes-Mbytes
1-10 ns$10/ MByte
DiscThousands of G Bytes / Tbytes
10 ms$0.0016/ MByte
TapeInfinite
Seconds-minutesi
Central memory GBytes
100ns- 300ns$1/ MByte
CACHE: small and very fast memory. Discussed later
Characteristics
4
• Inclusion: All information of the upper levels (those increasingly nearer the CPU) are present in the lower levels. Very often used (but not always)
• Coherency: information data in different levels must be consistent and therefore update policies must be implemented
• Write-through: immediate information blocks update• Write-back: information update is delayed until mandatory (i.e. a data replacement or its
request by other processors)
• A replacement policy must be therefore defined
• NB: Information blocks in caches are called «lines» and in the central memory «pages».
Locality principle
5
• Each program in any phase of its execution uses only a small portion of the memory data/instrutions
• Two locality types:
Time locality : when a data item has been accessed it is very likely that the same item will be accessed in the near future (i.e. loop)
Space locality: when a data item has been accessed it is very likely other items of near address will be accessed (i.e. vecotrs, matrices, linear code …)
Working Set
Memory hierarchyGeneral issues
6
• It solves the following problems:
The speed difference between processors and memories The need of big size central memories
• The main characteristic of the balance between cache and central memory is the speed and the transferred elements (indivisible – no portion of them) which are the the lines (32-256 plus bytes. The size depends on the number of cache levels)
• The main characteristic of the balance between the central memory and the disks is the capacity and the transferred elements are the pages (4KB-128KB), that is blocks of fixed size either of programs or of data – see later for their use
• A computer can have either, neither or both of them (caching – paging)
• Exploiting the memory hierarchy and the locality principle we achieve two goals: A (virtual) memory space is made available to the programmer, whose size is equal to the addressable
central memory space (which depends on the parallelism of the computer address). The physical central memory is always smaller than the addressable space. The central memory is much slower than the cache memory
The maximum speed access is granted to the processor, which accesses in most cases only the cache which is much faster than the central memory
This implies that faults must be handled that is cases when a memory level (either cache or central) DOES NOT contain the requested data and must get them from the a lower level memory, for instance cache lines (from central memory) or central memory pages (from disk). Double faults are obviously possible but unlikely if the system is well managed.
Terminology
7
• There is a HIT when the requested data are present in the hierarchy level to which it was requested (i.e. the first level cache for the processor or the central memory for the last level cache)
• There is a MISS when the data ARE NOT present in the hierarchy level to which it was requested and must be retrieved recursively from lower levels
CacheBlock
A
Central Memory
BlockA
BlockB
BlockN
Processor There is a HIT when the processor requests data belonging to block A and a MISS if the processor requests data belonging to block B.In case of a MISS the time for accessing lines of block B (miss penalty) depends on the request time the time of extracting block B and the transfer time between levels. This time increases according to the distance of the data (n. of levels) from the CPU (it varies from few to thousands clock cycles). Bigger is the block, bigger is the transfer time but in this case the miss rate (the probability of a miss in a data block) decreases. There must be therefore a reasonable balance in order to keep to a minimum the miss rate x miss penalty
BlockR
Problems
8
• Where can be placed in cache (block placement) data (lines) of the block B of the main memory? For instance it could replace data of block A?
• How can data be found in cache (block identification) ?• How can we choose the block in cache (line) to be replaced when the cache is already full ? Many
policies (see later caches coherency and BTB) • What happens when we write a line (write strategy) (for instance a line of block A) ? Normally a
write-back policy (see later caches) is used
CacheBlock
A
Memoria
BlockA
BlockB
BlockN
Processor
BlockR
Virtual Memory
9
• The concept of logical (virtual) address space vs physical address space is the basis of the memory management
Logical addresses -generated by the CPU and known also as virtual addresses or linear addresses of the data
Physical addresses – the real, physical addresses where the requested data are stored
• Memory Management Unit: a hw/sw device which maps the logical (virtual) addresses to physical addresses. Used only in medium to high performance processors
• The programmer deals only with the logical addresses and is always totally
unaware of the physical addresses of where the requested data are located
Paging
10
• The physical memory is subdivided by the hw in fixed size (2’s power) blocks called frames• The logical (virtual) memory is for the programmer a sequence of consecutive addresses which is
interpreted by the hw as subdivided in blocks of equal size (pages). Pages and frames are of the same size
• The OS manages the frames (free or occupied)• In order to execute a program at any time only n of its pages are needed (working set). For the
execution therefore a program needs only n frames not necessarily contiguous (normally they are never contiguous) where the working set can be stored.
• A mapping system is therefore needed (the pages table which contains the initial physical addresses of all frames where the program pages are stored.)
• The memory fragmentation (but for the last frame of a program) is therefore avoided.• The CPUs virtual address is normally interpreted by the hw as made of two components:
The m MSBits are the page number, that is the index in a table (page table) which allows the retrieval (the first physical address) of the corresponding frame
The n LSBits are the offset in page that is the value which must be added to the initial physical address to retrieve that data. Since the pages (and frames) are always of the same size and a 2’s power (they aligned - the initial address of each one has its n LSBits equal zero) the offset must be only joined to the MSBits
11
Paging
MAPPINGPage table
Page number to frame initial address (MSBs)
Physical Memory
Virtual addresses
PAGE
PAGE
PAGE
PAGE
PAGE
PAGE
PAGE
PAGE
PAGE
PAGE
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
Physical addresses
2k elements
Logical Memory
Page number Offset
k bits n bits
Frame number Offset
h bits n bits
K>>h
12
Paging
Virtual Page NumberOffset in page
+Page table
Initial address
Physical page number(Page initial address)
Always aligned (LSbits always zero !!)
Offset inpage
(joined)
Page table
Processor generated address
Page descriptor
Datum physical address
Status
Address translation
Page table implementation
13
• The page tables (one for each taks !!) are stored in the central memory
• A base table address register must point to the page table initial address. The size of each page table corresponds to the size of the virtual memory size divided by the page size and multiplied by the number of bytes for each table entry.
• The OS must manage another table indicating which physical pages are freee and/or occupied
• In order to avoid double memory access for each data access a special cache must exist called Translation Lookaside Buffer)which provides the physical page address without accessing the page table in main memory
Translation LookasideBuffer (TLB)
14
Virtual Page Number Offset in page
+
Page table
Processor generated address Status
TLB(Within theprocessor)
Hit
Miss
The TLB stores the translation (virtual to physical) of the last n addresses. It is a cache
Page tableInitial address
Physical page number Offset
Datum physical address
15
Paging (x86)
Access protection
Dirty bit
Reference bits
Present/Missing
Status bits
Each page can be defined as read only, read/write, user, system etc.
It indicates whether the page content has been modified, When modified it must be written back to bulk memory when replaced
It indicates whether the page was accessed (used by the replacement algorithm))
A virtual page may or may not be in the physical memory. In the latter case in the page descriptor the page address location in the bulk memory is stored
Valid/Invalid
It indicates whether a physical page corresponds to a frame of the virtual memory
16
Paging
Page size
Big pages(bigger than 32 KB)
• Reduced comprehensive access time (latency time)• Reduced transfer time (reduced page-miss frequency)• Smaller page table size• Bigger internal fragmentation
Small pages (typically 4-8 KB)
• Increased access time (increased seek time)• Increased transfer time (increased page-miss frequency)• Bigger page table size• Thrashing• Smaller internal fragmentation
Normally the page size lies between 4KB and 256 KB
Page fault
17
• Page load occurs «on demand» that is when one of its data are requested and the page is not already in memory: a OS trap is generated in this case
• The OS checks whether a non valid access took place (aborted) or the page is not yet in memory (page fault)
• In the latter case the OS checks whether a free frame is available and stores there the requested page. When no free frames are available an occupied frame is freed. If modified (dirty bit) the page is written back to the bulk memory. The page table is then modified
• The OS restarts the interrupted instruction of the interrupted task (restartable instruction)
Main Memory
If the page is not in memory?
18
ProcessorTranslationmechanism
Virtual address
Page already in memory
Must be «always» availablein memory (at least the portion needed)
Page
Datum
Offset
Miss (fault)
Bulk memory(disk)
Fault handlerOS
Hit
19
Page table organisation
N.B. a different page table exists for each task
S.O.
TP-P1
TP-P2 Page1
Page 2Page 3
Page0
File system
27 0M
44 1M
8714 2D
16 3M
Phys page 27
Phys page 16
Phys page 44
Phys. page or disk address
i.e.: frame 2 is located in page 44; frame 2 is on disk sector 8714
Frame number
Memory or DiskProcess 1 page table (TP-P1)
Page table size
20
• Consider a virtual address space with 36 bit address parallelism and frames/pages of 16Kbytes (16kbytes corresponds to a 14 bits offset. The frame number consists therefore of 22 bits).
• The page table contains 222 descriptors (22 x 210 x 210 = 4 x 1024 x 1024 = 4M) each of 22 bits (pages are aligned which means that their initial addresses has the 14 LSbits equal zero. They must not therefore be stored in the descriptor). If there are 10 status bits 4 bytes per descriptor are needed. The page table of each process is therefore 4M x 4 bytes = 16 Mbytes !
• The total memory space for the page tables is therefore 16mbytes x number of active processes (very often hundreds): memory occupancy unacceptable
Multiple levels page table
21
Hyrerarchical organisation(case of 4KB pages and 32 bit address )
Level 1 Level 2 Offset
Address 32 bit. level I: 10 bit; level II : 10 bit; (each table slot 4 bytes !) - offset in page : 12 bit Table level I: 4 KB (points to 1024 level II tables – 4 bytes/address + status)
Table level II: 4 KB (points to 1024 data/code pages - 4 bytes/address + status)
1024 (210) elements 20+12 (status)=32 BIT ->4 KB
1024 2nd lev.tables
Each table size (1 or 2 level) is1024 x 4 = 4 KB that is the size of a page! 2nd level tables are loaded when necessary
Addr = 32 bit = 4 bytes but pages aligned (12 LSB = 0) -> 20 bit + 12 status bits
First level table, loaded when the task is started, is always present in memory
Phys. Addr. (aligned)of Tab. liv. 2
Phys. initial addresses of the user pages (data/code)
10 bit 10 bit 12 bit System register Initial physicl address
of level 1 tableOne for each process
Task physical page
Virtual Page Number
22
Hyrerarchical organisation(4 KB pages)
1023
n
0
1023
m
0
Offset
I level table – 10 bit (1024 II level tables )
Each element stores the physical initial address + status
of a II level table
123
123
31 12 11 0
Address status
4 bytes
II level table – 10 bit ( 1024 elements ) Each element stores the physiscal initial address + status
of a physical memory page (contiguous ddresses) and therefore corrisponds to 1024 x 4 KB (12 bit) = 4 MB
Physical page 4 KB (12 bit) (Contiguos addresses)
Virtual memory 4 GBconceptually divided in 4MB blocks
(4GB/1024 – 10 bit))(4 MB -> 22 bit)
(physical addr.)
Level 1 Level 2 Offset
Virtual Page Number
10 bit 10 bit 12 bit
4 bytes
II lev init. addr+status Page init ,addr + status
(In this example a 16 bit word)
4 bytes if parallelism 32 bit)
Dato
(physical addr.)
Virtual Page Number
23
Hyrerarchical organisation(4 KB pages)
00000000100000000011000000011001
Ex. Addr. 00803019H -> 00000000100000000011000000011001
4M
1023
n
0
4M
4M
4K
4K
4K 1023
m
0
25
4M 14M 24M 3 4K 1
4K 24K 3
31 12 11 04 bytes
1° Liv: slot 2 2° liv: slot 3 Offset: 25d
Physical page
Totale 4 GB
(a byte in this example)
(The size of the addressed datadepends on the operation code )
(physical addr.)(physical addr.)
Address status
24
Hyerarchical organisationEach II level table is a page which does not contain data BUT the physical address of page of the requested data.
Upon a context switch only the first level table (4KB) must be present in memory while the second level tables are recalled only when needed using a Least Recently Used meachnism similar to that of the data pages
In the modern processors where the address parallelism is over 38 bit 3 levels hyerarchical page sistems are implemented
As already pointed out each data access would require multiple memory accesses: unacceptable. The Translation Lookaside Buffer –TLB mechanism is used which stores the last translations between logical and physical addresses, drastically reducing all access delay (but for the memory data access which in turn is reduced with code/data caches – see later) -. N.B. page tables changes (for instance the initial address of a data page) are NOT automatically
reflected in the TLB which must be cleared upon a context switch. The OS is responsible for the congruence.