chapter 8, main memory
DESCRIPTION
Chapter 8, Main Memory. 8.1 Background. When a machine language program executes, it may cause memory address reads or writes From the point of view of memory, it is of no interest what the program is doing - PowerPoint PPT PresentationTRANSCRIPT
1
Chapter 8, Main Memory
2
8.1 Background
• When a machine language program executes, it may cause memory address reads or writes
• From the point of view of memory, it is of no interest what the program is doing
• All that is of concern is how the program/operating system/machine manage access to the memory
3
Address binding
• The O/S manages an input queue in secondary storage of jobs that have been submitted but not yet scheduled
• The long term scheduler takes jobs from the input queue, triggers memory allocation, and puts jobs into physical memory
• PCB’s representing the jobs go into the scheduling system’s ready queue
4
• The term memory address binding refers to the system for determining how memory references in programs are related to the actual physical memory addresses where the program resides
• In short, this aspect of system operation stretches from the contents of high level language programs to the hardware the system is running on
5
Variables and memory
• 1. In high level language programs, memory addresses are symbolic.
• Variable names make no reference to an address space in the program
• But in the compiled, loaded code, the variable name is associated with a memory address that doesn’t change during the course of a program run
• This memory location is where the value of the variable is stored
6
Relative memory addresses
• 2. When a high level language program is compiled, typically the compiler generates relative addresses.
• This means that the loaded code contains address references into the data and code space starting with the value 0
• Instructions which have variable operands, for example, refer to the variables in terms of offsets into the allocated memory space beginning at 0
7
Loader/linkers
• 3. An operating system includes a loader/linker.
• This is part of the long term scheduler functionality.
• When the program is placed in memory, assuming (as is likely) that its base load address is not 0, the relative addresses it contains don’t agree with the physical addresses it occupies
8
Resolving relative addresses to absolute addresses
• A simple approach to solving this problem is to have the loader/linker convert the relative addresses of a program to absolute addresses at load time.
• Absolute addresses are the actual physical addresses where the program resides
9
• Note the underlying assumptions of this scenario
• 1. Programs can be loaded into arbitrary memory locations
• 2. Once loaded, the locations of programs in memory don’t change
10
Compile time address binding
• There are several different approaches to binding memory access in programs to actual locations
• 1. Binding can be done at compile time• If it’s known in advance where in memory a
program will be loaded, the compiler can generate absolute code
11
Load time address binding
• 2. Binding can be done at load time• This was the simple approach described
earlier• The compiler generates relocatable code• The loader converts the relative addresses to
actual addresses at the time the program is placed into memory.
12
Run time address binding
• 3. Binding can be done at execution time• This is the most flexible approach• Relocatable code (containing relative
addresses) is actually loaded• At run time, the system converts each relative
memory address reference to a real, absolute address
13
• Implementing such a system removes the restriction that a program is always in the same address space
• This kind of system supports advanced memory management systems like paging and virtual memory, which are the topics of chapters 8 and 9, on memory
14
• In simple terms, you see that run time, or dynamic address binding supports medium term scheduling
• A job can be offloaded and reloaded without needing either to reload it to the same address or go through the address binding process again
15
• The diagram on the following overhead shows the various steps involved in getting a user written piece of high level code into a system and running
16
17
Logical vs. physical address space
• The address generated by a program running on the CPU is a logical address
• The address that actually gets manipulated in the memory management unit of the CPU—that ends up in the memory management unit memory address register—is a physical address
• Under compile time or load time binding, the logical and physical addresses are the same
18
• Under run time/execution time binding, the logical and physical addresses differ
• Logical addresses can be called virtual addresses.
• The book uses the terms interchangeably• However, for the time being, it’s better to refer
to logical addresses, so you don’t confuse this concept with the broader concept of virtual memory, the topic of Chapter 9
19
• Overall, the physical memory belonging to a program can be called its physical address space
• The complete set of possible memory references that a program would generate when running can be called its logical address space (or virtual address space)
20
• For efficiency, memory management in real systems is supported in hardware
• The mapping from logical to physical is done by the memory management unit (MMU)
• In the simplest of schemes, the MMU contains a relocation register
• Suppose you are doing run time address binding
21
• The MMU relocation register contains the base address, or offset into main memory, where a program is loaded
• Converting from a relative address to an absolute address means adding the relative address generated by the running program to the contents of the relocation register
22
• When a program is running, every time an instruction makes reference to a memory address, the relative address is passed to the MMU
• The MMU is transparent. • It does everything necessary to convert the
address
23
• For a simple read, for example, given a relative address, the MMU returns the data value found at the converted address
• For a simple write, the MMU takes the given data value and relative address, and writes the value to the converted address
• All other memory access instructions are handled similarly
• An illustrative diagram of MMU functionality follows
24
Memory management unit functionality with relative addresses
25
• Although the simple diagram doesn’t show it, logical address references generated by a program can be out of range
• In principle, these would cause the MMU to generate out of range physical addresses
• However, the point is that under relative addressing, the program lives in its own virtual world
26
• The program deals only in logical addresses while the system handles mapping them to physical addresses
• It will be shown shortly how the possibility of out of range references can be handled by the MMU
27
• The previous discussion illustrated addressing in a very basic way
• What follows are some historical enhancements, some of which led to the characteristics of complete, modern memory management schemes– Dynamic loading– Dynamic linking and shared libraries– Overlays
28
Dynamic loading
• Dynamic loading is a precursor to paging, but it isn’t efficient enough for a modern environment
• It is reminiscent of medium term scheduling• One of the assumptions so far has been that a
complete program had to be loaded into memory in order to run
• Consider the alternative scenario given on the next overhead
29
• 1. Separate routines of an application are stored on the disk in relocatable format
• 2. When a routine is called, first it’s necessary to check if it’s already been loaded. – If so, control is transferred to it
• 3. If not, the loader immediately loads it and puts an entry in the application’s address table– An application address table entry contains the value
that would go into the relocation register for each of the routines, when it’s running
30
Dynamic linking and shared libraries
• To understand dynamic linking, consider what static linking would mean
• If every user program that used a system library had to have a copy of the system code bound into it, that would be static linking
• This is clearly inefficient. • Why make multiple copies of shared code in
loaded program images?
31
• Under dynamic linking, a user program contains a special stub where system code is called
• At run time, when the stub is encountered, a system call checks to see whether the needed code has already been loaded by another program
• If not, the code is loaded and execution continues• If the code was already loaded, then execution
continues at the address where the system had loaded it
32
• Dynamic linking of system libraries supports both transparent library updates and the use of different library versions
• If user code is dynamically linked to system code, if the system code changes, there is no need to recompile the user code.
• The user code doesn’t contain a copy of the system code
33
• If different versions of libraries are needed, this is straightforward
• Old user code will use whatever version was in effect when it was written
• New versions of libraries need new names (i.e., names with version numbers) and new user code can be written to use the new version
34
• If it is desirable for old user code to use the new library version, the old user code will have to be changed so that the stub refers to the new rather than the old
• Obviously, the ability to do this is all supported by system functionality
35
• The fundamental functionality, from the point of view of memory management, is shared access to common memory
• In general, the memory space belonging to one process is disjoint from the memory space belonging to another
• However, the system may support access to a shared system library in the virtual memory space of more than one user process
36
Overlays
• This is a technique that is very old and has little use in modern computers/operating systems
• It is possible that it would have some application in modern environments, like cell phones, where physical memory was extremely limited
37
• Suppose a program ran sequentially and could be broken into two halves where no loop or if reached from the second half back to the first
• Suppose that the system provided a facility so that a running program could load an executable image into its memory space
• This is reminiscent of forking where the fork() is followed by an exec()
38
• Suppose those requirements were met and memory was large enough to hold half of the program but not all of the program
• Write the first half and have it conclude by loading the second half
39
• This is not simple to do, it requires system support, it certainly won’t solve all of your problems, and it would be prone to mistakes
• However, something like this may be necessary if memory is tiny and the system doesn’t support advanced techniques like paging and virtual memory
40
8.2 Swapping
• Swapping was mentioned before as the action taken by the medium term scheduler
• Remember to keep the term distinct from switching, which refers to switching loaded processes on and off of the CPU
• In this section, swapping will refer to the approach used to support multi-programming in systems with limited memory
41
• Elements of swapping existed in early versions of Windows
• Swapping continues to exist in Unix environments
42
• This is the scenario for swapping:• Execution images for >1 job may be in
memory• The long term scheduler picks a job from the
input queue• There isn’t enough memory for it• So the image of a currently inactive job is
swapped out and the new job is swapped in
43
• Medium term scheduling does swapping on the grounds that the multi-programming level is too high
• In other words, the CPU is the limiting resource• Swapping as discussed now is implemented
because memory space is limited• Note that swapping for either reason isn’t
suitable for interactive type processes
44
• Swapping is slow because it writes to a swap space in secondary storage
• Swapping can be useful as a protection against limited resources, whether CPU (medium term scheduling) or memory (swapping as described here)
• However, transferring back and forth from the disk is definitely not a time-effective strategy for supporting multi-programming, let lone multi-tasking, on a modern system
45
8.3 Contiguous Memory Allocation
• Along with the other assumptions made so far, such as the fact that all of a program has to be loaded into memory, another assumption is made
• In simple systems, the whole program is loaded, in order, from beginning to end, in one block of physical memory
46
• Referring back to earlier chapters, the interrupt vector table is assigned a fixed memory location
• O/S code is assigned a fixed location• User processes are allocated contiguous blocks
in the remaining free memory• Valid memory address references for
relocatable code are determined by a base address and a limit value
47
• The base address corresponds to relative address 0
• The limit tells the amount of memory allocated to the program
• In other words, the limit corresponds to the largest valid relative address
48
• The limit register contains the maximum relative address value.
• The relocation register contains the base address allocated to the program
• Keep in mind that when context switching, these registers are among those that the dispatcher sets
• The following diagram illustrates the MMU in more detail under these assumptions
49
MMU functionality with relative addresses, contiguous memory allocation, and limit and relocation registers
50
Memory allocations
• A simple scheme for allocating memory is to give processes fixed size partitions
• A slightly more efficient scheme would vary the partition size according to the program size
• The O/S keeps a table or list of free and allocated memory
51
• Part of scheduling becomes determining whether there is enough memory to load a new job
• Under contiguous allocation, that means finding out whether there is a “hole” (window of free memory) large enough for the job
• If there is a large enough hole, in principle, that makes things “easy” (stay tuned)
52
• If there isn’t a large enough hole you have two choices:
• A. Wait and schedule the new process when a large enough hole becomes available
• B. Set the current new job aside and have the scheduler search for jobs in the input queue that are small enough to fit into available holes
53
The dynamic storage allocation problem
• This is a classic problem of memory management
• The assumption is that scattered throughout memory are various holes of contiguous memory large enough for the process to be loaded into them
• The question is how to choose which of those holes to load the process into
54
• Historically, three algorithms have been considered
• 1. First fit: Put a process into the first hole found that’s big enough for it. – This is fast and allocates memory efficiently
• 2. Best fit: Look for the hole closest in size to what’s needed. – This is not as fast and it’s not clearly better in
allocation
55
• 3. Worst fit: This essentially means, load the job into the largest available hole. – In practice this performs as well as its name…
• Note that for any of these three choices, the question is not where in the hole to load the process
• For the sake of argument, assume that it will be loaded at the beginning of the hole
56
External fragmentation
• External fragmentation describes the situation when memory has been allocated to processes leaving lots of scattered, small holes
• If sufficiently small, the holes are wasted memory space under contiguous loading
• Even though worst fit doesn’t work, the idea behind it was to leave usable size holes
57
• Empirical studies have shown that for an amount of allocated memory measured as N, an amount of memory approximately equal to .5N will be lost due to external fragmentation
• This is known as the 50% rule• In other words, under contiguous memory
allocation about 1/3 of memory is wasted due to unusable, small memory holes external to the blocks that are successfully allocated
58
Block allocation
• In reality, memory is typically allocated in fixed size blocks rather than exact byte counts corresponding to process size
• Keeping track of arbitrary, varying amounts of memory allocation is not practical due to the overhead involved
• A block may consist of 1KB or some other measure of similar magnitude or larger
59
• Under block allocation, a process is allocated enough contiguous blocks to contain the whole program
• External fragmentation still results under block allocation
• The smallest possible hole will be one block
60
Internal fragmentation
• Something called internal fragmentation also results from block allocation
• This refers to the wasted memory in the last block allocated to a process
• Internal fragmentation on average is equal to ½ of the size of one block
61
Picking a block size
• Picking a block size is a classic case of balancing extremes
• If block size is large enough, each process will only need one block.
• This degenerates into fixed partitions for processes, with large waste due to internal fragmentation
62
• If block size is small enough, you approach allocating byte by byte, which is undesirable due to record keeping overhead
• If the blocks are small, internal fragmentation is insignificant, but this is not an overriding advantage
63
• Block allocation is a desirable enhancement of contiguous memory allocation, but it’s still contiguous memory allocation
• It is reasonable to assume that external fragments, even if measured in units of blocks rather than bytes, can become small enough to be unusable
64
Memory compaction
• Memory compaction is an approach to solving the fragmentation resulting from contiguous memory allocation
• Compaction refers to relocating programs loaded in memory in order to reduce fragmentation
• Relocation is a system process that happens dynamically, without unloading the user processes
65
• If programs use absolute memory addresses, they simply can’t be relocated.
• Memory couldn’t be compacted without recompiling the programs.
• This would require unloading them and loading the recompiled code
• This is out of the question• It would not be a dynamic process
66
• If programs use relative memory addresses, they are relocatable.
• Even during run time, they can be moved to new memory locations
• The system relocation process accomplishes this by doing the relocating and updating the base and offset register values for user processes
67
• Relocation makes it possible to squeeze the loaded programs together in memory, squeezing out the unusable fragments
• Compaction is not an inexpensive proposition• This is just another example of a trade-off• Do the savings in memory usage justify the
cost of relocating processes on the fly?
68
8.4 Paging
• Paging is a big deal• Fundamentally, paging is a memory
management technique that makes it possible to load a program into non-contiguous memory
69
• A page is a fix-sized block• A program may be large enough that it has to
be loaded into more than one page• Under block allocation, as just described, the
blocks for a program have to be contiguous• Under paging, the allocated pages do not have
to be contiguous
70
• Paging solves two problems:• 1. Under paging, external fragmentation is not a
problem.– Even a single, isolated, unallocated page is still usable– It can be allocated as part of a non-contiguous
allocation• Another way of putting this is that with paging,
memory compaction will never be needed
71
• 2. Under paging, fragmentation in the swap space in secondary storage is also eliminated
• When memory compaction was discussed above, its relationship to swapping was not mentioned
• It turns out that memory compaction and swapping are incompatible
72
• Under swapping, what is kept in secondary storage is not just a “list” of the jobs in memory
• What is kept is an image of what’s loaded in memory• If the location of things in memory is reorganized,
that would require reorganizing the image in secondary storage
• This is not practical because it would take too long• Memory compaction is not necessary with paging, so
this problem is solved
73
How paging is implemented
• Paging is based on the idea that the O/S can maintain data structures that match given blocks in physical memory with given ranges of virtual addresses in programs
• Physical memory is conceptually broken into fixed size frames
• Logical memory is broken into pages of the same size
74
• In contiguous memory allocation there was a limit register and a relocation register
• In paging, the idea of a limit doesn’t really apply
• All of the pages are of the same size• The concept of relocation is handled by a table
which matches page addresses with frames
75
• In other words, under paging, addresses are kept track of with a look-up table
• For a given logical page address, the system maintains a table entry telling which physical frame it’s stored in
• In paging there are special registers for placing the logical address and forming the physical address using the look-up value
76
• It is important to understand that under paging, allocation isn’t contiguous, but complete programs do have to be loaded
• For a program of x pages, x frames will be needed
• The number of frames allocated will differ for programs of different sizes
77
• It is worthwhile to take a look at the sequence in the development of memory allocation in terms of two constraints:– Does a whole program have to be loaded?– Does it have to be in contiguous memory?
• Step one was contiguous memory allocation– A whole program has to be loaded– It has to be loaded into contiguous memory– A refinement of this was using blocks to allocate
contiguous memory
78
• Step two is paging– Complete programs have to be loaded– The frames they are allocated to do not have to be
contiguous• Step three will be virtual memory– Complete programs will not have to be loaded– Allocated memory will not have to be contiguous
• Virtual memory, which has neither constraint, is the ultimate development in memory allocation
79
Implementation Details for Simple Paging
• Every (logical) address generated by the CPU takes this form:
• Page part (p) | offset part (d)• The page part is a page id• (Do not confuse this with pid, which will be
used to identify processes.)
80
• The offset part is the location of a given word within the page that contains it
• More specifically, let an address consist of m bits
• Then a logical address can be pictured as shown on the next overhead
81
82
• The addresses are binary numbers• There are m bits for the address overall• That means the address space consists of 2m
bytes• The range of valid addresses goes from 0 to 2m
– 1
83
• The components of the address fall neatly into two parts
• The (m – n) digits for p can be treated separately as a page number in the range from 0 to 2(m – n) – 1
• The n digits for d can be treated separately as an offset in the range from 0 to 2n – 1
• The fact that n bits are reserved for the offset into a page implies that the size of a page is 2n bytes
84
Forming an address from a page table
• Paging is based on maintaining a page table• For some page value p, the corresponding
frame value f is looked up in the page table• The lookup is done at offset p in the table• The offset d, is unchanged
85
• The physical address is formed by appending the binary value for d to the binary value for f
• The result is f | d• The formation of a physical address from a
logical address, p | d, using a page table, is illustrated in the following diagram
86
87
The contents of a page table
• In theory you could have a global page table containing entries for all processes
• In practice, each process may have its own page table which is used when that process is scheduled
• When a process is initially scheduled by the long term scheduler, its page table would be populated with the frames allocated to it
88
• When the short term scheduler context switches between processes, an address register pointing to the page table would be changed
• The use of the page table for a single process can be illustrated with a simple example
• Each page table entry gives a base physical address (of a frame) for a given page in the process
89
90
• Note again that under paging there is no external fragmentation
• Every empty physical memory space is a usable frame
• Internal fragmentation will average one half of a frame per process
91
Page sizes
• In modern systems page sizes vary in the range of around 512 bytes to 16MB
• The smaller the page size, the smaller the internal fragmentation
• However, if the memory space is large, there is overhead in allocating small pages and maintaining a page table with lots of entries
92
• As hardware resources have become less costly, larger memory spaces have become available
• Page sizes have grown correspondingly large• Page sizes of 2K-8K may be considered
representative of an average, modern system
93
Summary of paging ideas
• 1. The logical view of the address space is separate from the physical view. – The system provides the mapping between logical
pages and physical frames– Code is relocatable on a page by page basis– Code addresses are not absolute
94
• 2. The logical view of the address space is of contiguous memory.
• Paging is completely hidden by the MMU. – Allocation of frames is not contiguous– However, programs have to be loaded in their
entirety
95
• 3. Although the discussion has been in terms of the page table, in reality there is also a global frame table. – The frame table provides the system with ready
look-up of which frames have been allocated– It tells which processes they’ve been allocated to– And importantly for future allocations, it tells
which frames are free and still available for allocation
96
• 4. There is a page table for each process. – It keeps track of memory allocation from the
process point of view– It supports the translation from logical to physical
addresses
97
Hardware support for paging
• A page table has to hold the mapping from logical pages to physical frames for a single process
• Note that the page table resides in memory• The minimum hardware support for paging is
a dedicated register on the chip which holds the address of the page table of the currently running process
98
• With this minimal support, for each logical memory address generated by a program, two accesses to actual memory would be necessary
• The first access would be to the page table, the second to the frame address found there
• This is expensive• In order to support non-contiguous memory
allocation, the cost of an access has doubled
99
• In order to be viable, paging needs additional hardware support.
• There are two basic choices• 1. Dedicated registers• 2. Translation look-aside buffers
100
• 1. Have a complete set of dedicated registers for the page table. – That is, each page table entry would reside in a
register– There would have to be as many registers as the
maximum number of frames that could be allocated per process
– This is fast, but the hardware cost (monetary and real estate on the chip) becomes impractical if the memory space is large
101
• 2. The chip will contain hardware elements known as translation look-aside buffers (TLB’s). – This is the current state of the art, and it will be
explained below
102
Translation look-aside buffers
• Translation look-aside buffers are in essence a special set of registers which support look-up.
• In other words, they are table-like. • They are designed to contain keys, p, page
identifiers; and values, f, the matching frame identifiers
• They are different from dedicated registers• They are designed to hold a subset of the page
table
103
• TLB’s have an additional, special characteristic. • They are not independent buffers. • They come as a collection• The “look-aside” part of the name is meant to
suggest that when a search value is “dropped” onto the TLB, for all practical purposes, all of the buffers are searched for that value simultaneously.
104
• If the search value is present, the matching value is found within a fixed number of clock cycles
• In other words, look-up in a TLB does not involve linear search or any other software search algorithm.
• There is no order of complexity to searching depending on the number of entries in the collection of TLB’s.
• Response time is fixed and small
105
• TLB’s are like a highly specialized cache• The set of TLB’s doesn’t store a whole page table• When a process starts accessing pages, this
requires reading the page table and finding the frame
• Once a page has been read the first time, an entry is made for it in the TLB
• Subsequent reads to that page will not require reading from the page table in memory
106
• Just like with caching, some accesses to memory will result in a TLB “hit” and some will result in a TLB “miss”
• A hit is very economical• With a hit, a memory access requires a
reference to the TLB followed by one main memory access
107
• A miss requires reading the page table and replacing (the LRU) entry in the TLB with the most recent page accessed
• In other words, a miss incurs the full “double” cost of accessing the memory twice
• The first access updates the TLB and the second finds the desired memory address
• Memory management with TLB’s is shown in the following diagram
108
109
• In the following diagram, the page table is shown in memory, where it’s located.
• The ALU, TLB’s, and logical and physical address registers are all in the CPU.
• The TLB’s and address registers are in the MMU of the CPU.
• A program running in the ALU generates a logical memory address which is passed to the MMU, which translates it to a physical address and reads from or writes to it.
110
111
• Note the following things about the diagram• The page table is complete, so a search of the
page table simply means jumping to offset p in the table
• The TLB is a subset, so it has to have both key, p, and look-up, f values in it
• The diagram shows addressing, but it doesn’t attempt to show, through arrows or other notation, the replacement of TLB entries on a miss
112
TLB hits and misses
• Paging costs can be summarized in this way• On a hit: TLB access + memory access• On a miss: TLB access + memory access to
page table + memory access to desired page• The book states that typical TLB’s are in the
range from 16 to 512 entries• With this number of TLB’s, a hit ratio of 80%-
98% can be achieved
113
Calculating the cost of paging
• Given a hit ratio and some sample values for the time needed for TLB and memory access, weighted averages for the cost of paging can be calculated
• For example, let the time needed for a TLB search be 20 ns.
• Let the time needed for a main memory access be 100 ns.
114
• Cost of TLB hit: 20 + 100 = 120• Cost of TLB miss: 20 + 100 + 100 = 220• Let the hit ratio be 80%• Then the overall, weighted cost of paging
is: .8(120) + .2(220) = 140
115
• In other words, if you could always access memory directly, it would take 100 ns.
• With paging, it takes on average 140 ns.• Paging imposes a 40% overhead on memory
access• On the other hand, without TLB’s, every memory
access would cost 100 ns. + 100 ns., which would mean a 100% overhead on memory access
116
Justification for paging
• Why would you live with a 40% overhead cost on memory accesses?
• Remember the reasons for introducing the idea of paging:
• It allows for non-contiguous memory allocation
117
• This solves the problem of external fragmentation in memory
• As long as the page size strikes a balance between large and small, internal fragmentation is not great
• There is also a potential benefit in reducing fragmentation in swap space—but supporting contiguous memory allocation is the main event
118
Having a global page table
• The previous discussion has referred to a page table as belonging to one process
• This would mean there would be many page tables
• When a new process was scheduled, the TLB would be flushed so that pages belonging to the new process would be loaded.
119
• The alternative is to have a single, unified page table
• This means that each page table entry, in addition to a value for f, would have to identify which process it belonged to
• Formally, the identifier is known as an ASID, an address space id
• Informally, it’s a process id, pid, plus a page id, p
120
• Such a table would work like this:• When a process generated a page id, the TLB
would be searched for that (pid, p) pair• If found, this is a TLB hit• Everything is good
121
• If not, this is simply a page miss• Replacement would occur using the usual
algorithm for replacement on a miss• With a page table like this, there is no need for
flushing when a new process is scheduled• In effect, the TLB is flushed entry by entry as
misses occur
122
Implementing protection in the page table with valid and invalid bits
• Recall that a page table functions like a set of base and limit registers
• Each page address is a base, and the fixed page size functions as a limit
• If a system maintains page tables of length n, then the maximum amount of memory that could theoretically be allocated to a process is n pages, or n * (page length) bytes
123
• In practice, processes do not always need the maximum amount of memory and will not be allocated that much
• This information can be maintained in the page table by the inclusion of a valid/invalid bit
124
• If a page table entry is marked “i”, this means that if a process generates that logical page, it is trying to access an address outside of the memory space that was allocated to it
• A diagram of the page table follows
125
126
A page table length register
• An alternative to valid/invalid bits is a page table length register (PTLR)
• The idea is simple—this register is like a limit register for the page table
• The range of logical addresses for a given process begins at page 0 and goes to some maximum which is less than the absolute maximum size allowed for a page table
• When a process generates a page, it is checked against the PTLR to see if it’s valid
127
• The valid/invalid bit scheme can be extended to support finer protections
• For example, read/write/execute protections can be represented by three bits
• You typically think of these protections as being related to a file system
128
• In theory, different pages of a process could have different attributes
• This may be especially important if you are dealing with shared memory accessible to >1 process
• It is also likely to be complicated in practice, and the idea won’t be pursued further here
129
8.5 Structure of the Page Table
130
• The topic of this section is the structure of page tables
• Before considering the structure, it’s helpful to consider the sizes of address spaces that a page table may have to support
• Modern systems may support address spaces in the range of 232 to 264 bytes
• 232 is 4 Gigabytes• 264 ~= 18.4+ x 1018
131
• The higher value is what you get if you allow all 64 bits of a 64 bit architecture to be used as an address
• Note that this is 16 x 260, but by this stage the powers of 2 and the powers of 10 do not match up the way they do where we casually equate 210 to 103
132
This is just a digression• According to Wikipedia• Standard prefixes for the SI units of measure • Multiples• Name: deca- hecto- kilo- mega- giga- tera- peta- exa- zetta-
yotta- • Symbol: da h k M G T P E Z Y • Factor: 100 101 102 103 106 109 1012 1015 1018 1021 1024 • Subdivisions • Name: deci- centi- milli- micro- nano- pico- femto- atto- zepto-
yocto- • Symbol: d c m µ n p f a z y • Factor: 100 10−1 10−2 10−3 10−6 10−9 10−12 10−15 10−18 10−21 10−24
133
• The reality is that modern systems support logical address spaces too large for simple page tables
• In order to support these address spaces, hierarchical or multi-level paging may be used
• Take the lower of the address spaces given above, 232
• Let the page size be 212 or 4 KB
134
• 232 bytes of memory divided into pages of size 212 bytes means a total of 220 pages
• The corresponding physical address space would consist of 220 frames
• That means that each page table entry would have to be at least 20 bits long, in order to hold the frame id
135
• For the sake of argument, suppose each page table entry is actually 32 bits or 4 bytes long
• This would allow for validity and protection bits in addition to the 20 bits for frame id
• It’s also simpler to argue using powers of 2 rather than speaking in terms of a table entry of length 3 bytes, for example
136
• A page table with 220 entries each of size 22 bytes means the page table is of length 222, or 4 MB
• But a page itself under this scenario was only 212, or 4 KB
• In other words, it would take 1 K of pages to hold the complete page table for a process that had been allocated the theoretical maximum amount of memory possible
137
• To restate the result in another way, the page table won’t fit into a single page
• In theory, it might be possible to devise a hybrid system where the memory for page tables was allocated and addressed by the O/S as a monolithic block instead of in pages
• Then that page table would support paging of user memory
138
• Having two different addressing schemes in the same system could be a mess and leads to questions like, could there be fragmentation in the monolithic page table block?
• It may be preferable not to have the page table consist of monolithic (and contiguous) memory
139
Multi-Level Paging
• One solution to the problem is hierarchical or multi-level paging
• The underlying idea is to come up with a scheme where a large page table can be managed as a collection of individual pages
• In one of its forms, multi-level paging is similar to indexing
• The book refers to this as a forward-mapped page table
140
• Under multi-level paging, given a logical page value, you don’t look up the frame id directly
• You look up another page that contains a page id for the page containing the desired frame id
• The book mentions that this kind of scheme was used by the Pentium II
141
• What comes next will cause some repetition later
• I will show the diagrams for multi-level paging• Then I’ll go through the details based on
counting the bits of addresses and what they stand for
• Then the diagrams will be repeated again
142
The Form of an Address
• A 32 bit address can be divided into 2 “page” parts plus the usual offset into a page
143
The “Page” Parts of the Address are “Forward Mapped” in Two Stages to Arrive
at the Correct Frame
144
Example Snapshot of Forward Mapping in Greater Detail
145
The Form of a Multi-Level Paging Address
• A logical address of 32 bits can be divided into blocks of 10, 10, and 12 bits
• 10 + 10 = 20 bits correspond to the page identifier
• Treating the first 20 bits as two blocks of 10 bits will result in a two level page table
• The remaining 12 bits of an address correspond to d, the offset into a page/frame
• The size of a page/frame is 212 bytes
146
• A page table entry has to be at least 212 bits, to contain the value of an offset, d
• In order to have nice numbers to work with, we’ll assume that a page table entry is actually 32 bits = 4 bytes = 22 bytes
• This would allow the inclusion of validity bits, permissions, etc.
• 210 page table entries at 22 bytes each will fit exactly into a page of 212 bytes
147
• The first 10 bits in an address will be used as an offset into an outer page table
• The entry found in that table will refer to one of 210 inner page tables
• The second 10 bits in the address will be used as an offset into that inner page table
• The entry found there will refer to the frame that is in the allocated address space of the process
148
• The two levels of addressing correspond to multiplication
• The outer page table has 210 entries• Each of these refer to an inner page table with
210 entries• Together, the two sets of 10 bits in the address
make 210+10 pages addressable
149
• The page size is 212 bytes• The last 12 bits of the address do not
represent 210 entries of size 22
• The values found in the second page table are direct byte offsets into the memory page/frame represented by the address
• The diagrams are repeated on the following overheads in order to illustrate all this
150
This is the form of a page address
151
• The following diagram shows how a logical address maps to a physical address through multiple levels
• Note that the address of the outer page table page is contained in the special page table register
• P1 and P2 are page table entry offsets, in the range 0210-1, 210 entries at 22 bytes each per page table page
• d is a literal byte offset into a page, in the range 0212-1
152
153
• The following diagram shows the multiple layers of the page table in greater detail
154
155
Calculating the cost of paging using a multi-level page table
• The cost of a page miss will be higher for a two level page table than for a one level table
• This is because three hits to memory would be needed to find a missing address rather than two
• Two of the three hits would be to the outer and inner pages of the page table, respectively
• The third hit would be to the frame of the address desired
156
• As before, let the time needed for a TLB search be 20 ns.
• Let the time needed for a main memory access be 100 ns.
• Cost of TLB hit: 20 + 100 = 120• Cost of TLB miss: 20 + 100 + 100 + 100 = 320
157
• In the calculation for the miss, the first 100 is the outer page table, the second 100 is the inner page table, the third 100 is the access to the desired address
• Let the hit ratio be 98%• Then the overall, weighted cost of paging
is: .98(120) + .02(320) = 124• The overhead cost of paging under this scheme is
24%
158
• Once again, a cost is incurred.• There is a trade-off involved.• We want to be able to use paging as the
access mechanism throughout the memory subsystem.
• A high TLB hit rate will be needed in order to make this possible.
159
Larger address spaces
• Observe what happens if you go to a 64 bit address space and a page size of 4KB
• Sample address breakdowns are shown on the next overhead for two and three level paging
• If you only break the address into three or four parts, the number of bits for one of the parts is so high that you again have the problem that a level of the page table won’t fit into a single page
160
161
• Depending on page size, some 32 bit systems would go to 3 or 4 levels
• To implement multi-level paging in a 64 bit system, you might need 6 levels
• This is too deep to be practical• A page miss would involve seven accesses to
memory• This makes the cost of paging too high
162
Hashed page tables--Hashing
• Hashed page tables provide an alternative approach to multi-level paging in a large address space
• The first thing you need to keep in mind is what hashing is, how it works, and what it accomplishes
163
How hashing works
• You may have a widely dispersed set of n different x values in a given domain
• You have a specific, compact set of y values that you want to map to in the range.
• You need a hashing function, y = f(x), that converts x values into the desired set of y values in the range
164
• In the ideal case, there would be a set of exactly n different, contiguous y values that the x’s map to
• That would mean that no two x values would ever collide, namely give f(x1) = f(x2) = y
• However, absence of collisions doesn’t typically happen
165
• f() needs to be devised so that the likelihood that any two x values will give the same y value is small
• f() also has to be quick and easy to compute• In practice the range will be somewhat larger
than n and collisions may occur• The most common kind of hashing function is
based on division and remainders
166
Division-remainder hashing
• Choose z to be the smallest prime number larger than n
• Then let f(x) = x % z• f(x) will fall into the range [0, z – 1]
167
• Hashing makes it possible to create a look-up table that doesn’t require an index or any sorting or searching
• Let there be z – 1 entries in the table• Store the entry for x at the offset y = f(x) in the
table• When x occurs again and you want to look up the
corresponding value in the table, compute y = f(x) and read the entry at that offset y
168
• Note that the value, x, has to be repeated as part of the table entry, along with the look-up value that goes along with it
• Repeating x is necessary in order to resolve collisions
• An example of a hashing algorithm and the resulting hash table is illustrated in the following diagram
169
170
Hashed page tables—Why?
• Consider again the background of multi-level paging and its disadvantages
• A multi-level page table provides a tree-like way of using pages to access the whole memory address space
• Each level in the tree corresponds to a block of bits in an address
171
• As the address space grows large, multiple levels become necessary to resolve a given logical address into a physical address in memory
• The more levels there are, the more memory accesses are needed to arrive at the desired address
172
• Consider the following scenario, where two of the foregoing assumptions are reversed:
• 1. Suppose that the page table can fit into a single page.– This may be possible for a sufficiently large page
size on a system supporting many relatively small interactive processes
173
• Expanding on assumption 1:• Suppose a page can hold m bytes• Suppose a page can hold n page table entries• The idea simply is that for m and n large
enough, m * n might be a sufficient allocation• Then the complete page table would fit on a
single page
174
• 2. Suppose also that the logical address space for the pages does not fall neatly into the range of 0-n.– Generating addresses for something like a shared
library could cause this– Non-relocatable code could cause this
175
• Under these altered assumptions, what would be the best way to organize and access the page table?
• The key observation is that the page parts of the logical addresses generated by a process do not necessarily fall neatly in the range of 0 to n – 1
176
• Whatever range the page parts of the addresses do fall into, you would like them to map into the range 0 to n – 1
• That would mean that you could have a look-up/page table of n entries which tell which physical frame has been allocated to each logical page.
• Hashing supports this kind of mapping from a possibly widely varying set of logical page values to a range from 0 to n – 1.
177
Making the Hashed Page Table
• When a logical page is allocated a frame, the virtual page id, p, is hashed to a location in the hash table in the range [0, z – 1]
• The hash/page table entry contains p, to account for collisions, and the id, f, of the allocated frame
• The offset portion of the address is carried along as usual
• Collisions could occur
178
• Collisions may be handled with links rather than overflow
• If two logical pages, q and p, hash to the same location, their corresponding frames, s and r, are contained in separate nodes
• The hash key, p, is included in the entry so that the correct node can be identified when a collision occurs
• An example is illustrated on the following overhead
179
180
• Suppose the logical address space for a process was contiguous, starting with 0
• Then it might not really be necessary to hash• Entries could just be placed at the offset
corresponding to the logical address• Notice that this scheme will support a logical
address space for an arbitrary selection of p values in addresses
181
Clustered page tables
• The book doesn’t give a very detailed explanation of this
• The general idea appears to be that memory can be allocated so that these properties hold:
• Several different (say 16) page id’s, p, will hash to the same entry in the page table
• This entry will then have no fewer than 16 linked nodes, one for each page, (and possibly more, due to collisions)
182
• Honestly, it’s not clear to me what advantage this gives
• The length of the page table would be reduced by a factor of 16, but it seems that the linked entries would effectively increase its width by a factor of 16
• I have no more to say about this, and there will be no test questions on it
183
Inverted page tables
• Inverted page tables are an important alternative to multi-level page tables and hashed page tables
• Recall that with process level page tables as explained so far:
• 1. The system has to maintain a global frame table that tells which frames are allocated to which processes
184
• 2. The system has to maintain a page table for each process, that makes it possible to look up the physical frame that is allocated to a given logical address
• Simple illustrations of both of these things are given on the next overhead
185
186
• An inverted page table is an extension of the frame table
• Instead of many page tables, one for each process, there is one master table
• The offsets into the table represent the frame id’s for the whole physical memory space
• The table has two columns, one for pid, the process that the frame/page belongs to, and one for p, the logical page id of the page
187
188
• The use of an inverted page table to resolve a logical address is shown in the diagram on the next overhead
• The key thing to notice about the process is that it is necessary to do linear search through the inverted page table, looking for a match on the pid that generated the address and the logical address that was generated
• The offset into the table identifies the frame that was allocated to the page
189
190
• On the one hand, you’ve gone from one frame table and many page tables to one, unified, inverted page table
• On the other hand, searching the inverted page table is the cost of this approach
• There is no choice except for simple, linear search because the random allocation of frames means that the table entries are not in any order
• It is not possible to do binary search or anything else
191
Hashed Inverted Page Tables
• This is where hashing and inverted page tables come together
• The way to get direct access to a set of values in random order is to hash
• Let n be the total number of pages/frames in the system, and devise a hashing function that will provide this mapping:
• f(pid, p) [0, n – 1]• Use this function to allocate frames to processes
192
• Then, when the logical address (pid, p) is generated, hash it, giving f(pid, p)
• In theory, the hash function value itself could be the frame id, f
• You still have to do table look-up because of the possibility of collisions
• No linked nodes are shown—maybe overflow is a better strategy for recording collisions
• Look-up consists of going to offset f in the table and checking there for the key values (pid, p).
193
• If (pid, p) is found, then f is the corresponding frame
• You don’t have to do linear search• If the values are not found, check for overflow
or linking until you find the desired values• Note that if you don’t find the desired values
at all, the process has tried to access an address that is out of range.
194
Hashed Inverted Page Tables and TLB’s
• The most recent discussions have left TLB’s behind, but they are still relevant as hardware support for addressing
• In a system that uses a hashed inverted page table with TLB’s, the TLB entries are a subset of the hashed inverted page table
• The TLB entries appear in whatever results from random replacement
• The entries are not in hash (sorted) order
195
• When an entry from the inverted hashed page table is put into the TLB, in addition to the pid of the process and the logical page id, p, the entry has to include the offset (index, i) into the page table, namely, the frame id, f
• This is what you look up in the TLB• A diagram of the use of a hashed inverted page
table with TLB’s is given on the following overhead
196
197
• Remember that since the table is stored in memory, that adds an extra memory access to the overall cost of addressing on a TLB miss
• Also note that in order to accommodate all frames in the system, in reality the table would probably be bigger than a page
• The table that supports paging is bigger than a page itself
198
• In other words, we’ve come back around to the problem which motivated multi-level paging in the first place
• The solution is essentially the solution that was rejected earlier
• Store the table in system space• Accessed the table through a special scheme, a non-
paged, system memory access mechanism• Although not paged itself, the table would support
paging in user applications
199
Inverted Hashed Page Tables and Frame Tables
• The inverted hashed page table discussion was based on allocating frames by hashing
• Since the inverted hashed page table included pid’s, it was global in scope, suggesting that all allocation information could be contained in it
• No mention was made of having a frame table in addition to the page table
• This simplified things and made the diagram easier to draw
200
• In reality, you would still want to support a separate frame table
• Allocation would not necessarily be done on the basis of hashing
• There would be a hash table which provided entry into the frame table
• The frame table would then contain the value for the frame that had been allocated to a given (pid, p) pair
201
• The idea is that the hash value, f(pid, p) = h, takes you to an offset in the hash table.
• What you look up in the hash table is the value i, which is the frame id that was assigned to pid|p
• In other words, i is the index or offset to the corresponding entry in the frame table.
• This is illustrated on the next overhead
202
203
• The end result of this would be that the inverted hashed page table entry would include values for pid, p, and I
• The diagram for the inverted hashed page table will not be redrawn to show this
• You still use hashing to enter that table• When you do, what you find is an i value that
went into the frame table at the original allocation
204
Shared pages
• Shared memory between processes can be implemented by mapping pages of their shared logical addresses to the same physical frames
• An operating system may support interprocess communication (IPC) this way
• It is also a convenient way to share (read only) data
• It’s also possible to share code, such as libraries which >1 process need in order to run
205
Reentrant code is shareable
• In order for code to be shareable, it has to be reentrant
• Reentrant means that there is nothing in the code which causes it to modify itself
• Consider the MISC sumtenV1.txt example• It is divided into a data segment and a code segment• Two processes could share the code as long as the
accesses to memory variables were mapped to separate copies of the variables
206
• Every memory access that a program makes has to pass through the O/S
• The O/S is responsible for protecting the memory allocated to one process from being accessed by another
• The O/S is also responsible for supporting shared access and and for detecting when shared memory may be being misused
207
• Threads are a good, concrete example of shared code
• We have considered some of the problems that can occur when threads share references to common objects
• If they share no references, then they are completely trouble free
208
Inverted page tables don’t support shared memory very well
• An inverted page table is a global structure for all frames in a system
• It effectively maps one logical page belonging to one process to one physical frame
• This makes it difficult to support shared memory pages between different processes by mapping them to the same frame
• To support shared memory, it would be necessary to add linking to the table or add other data structures to the system
209
8.6 Segmentation
• The idea behind segmentation is that the application writer doesn’t view memory simply as a linear array of bytes
• Also, the actual relative physical location of different program modules is not important
• Applications can be viewed in terms of logical program units
• Each separate logical unit could be identified by its base address in memory, and its length
210
• Segmentation supports the user view of memory
• A segmented address takes this form <segment id, offset into segment>
• The segment id translates into a base address• The segmented address then has to translate
into pages or whatever scheme is actually used to allocate memory
211
Implementation of segmentation
• A system with segmented addresses would have to support them in application software
• System implementations of compilation, linking, loading, and address resolution would all be adapted to use segmented addresses
• In a sense, segments may be reminiscent of simple contiguous allocation of blocks of memory in varying sizes
212
• Segments may also be thought of, very roughly, as (comparatively large) pages of varying size
• Just like with paging, hardware support in the MMU makes the translation possible
• The diagram on the next overhead shows how segmented addresses are resolved
213
214
• This is similar to one of the earliest diagrams showing in general how page addresses were resolved
• The segment table is like a set of base-limit pairs, one for each segment
• Just like with pages, in the long run you would probably want some sort of TLB support
215
• For the purposes of this brief introduction to the idea of segments, segments and pages are treated separately
• In real, modern systems with segmentation, the segments are subdivided into pages which are accessed through a paging mechanism
• In other words, segments are a layer in the memory addressing scheme that lies on top of the paging mechanism
216
Protection and sharing with segmentation
• The theory is that protection and sharing make more logical sense under a segmented scheme
• Instead of worrying about protection and sharing at a page level, the assumption is that the same protection and sharing decisions would logically apply to a complete segment
217
• In other words, protection is applied to semantic constructs like “data block” or “program block”
• Under a segmented scheme, semantically different blocks would be stored in different segments
• Similarly with sharing• If two processes need to share the same block, let
the block be stored in a given segment, and give both processes accesses to the segment
218
• Although perhaps clearer than paged sharing, segmented sharing doesn’t solve all of the problems of sharing
• Two processes may know the same, shared code by different symbolic names
• There has to be a mapping from the symbolic name to the base address of the allocated memory
219
• The memory space of the shared code will not be contiguous with the memory space of the processes that share it
• A process that shares code will generate addresses in its memory space
• Then when it enters the shared code, the execution, on behalf of the process, will generate addresses in the shared code memory space
220
• The system has to support the resolution of addresses when processes cross the boundary from unshared to shared code
• Potentially, ifs or jumps across boundaries have to be supported (from one address space to another) and the return from shared code has to go to the address space of whichever process called it
221
Segmentation and fragmentation
• Segmentation, in the sense that it’s like contiguous memory allocation, suffers from the problem of external fragmentation
• The difference is that a single process consists of multiple segments and each segment is loaded into contiguous memory
• The ultimate solution to this problem is to break the segments into pages
222
8.7 Example: The Intel Pentium
• The reality is that the Intel 8086 architecture has had segmented addressing from the beginning.
• The Motorola 68000 didn’t.• The following details are given in the same spirit
that the information about scheduling and priorities was given in the chapter on scheduling
• Namely, to show that real systems tend to have many disparate features, and overall they can be somewhat complex
223
• The following summary was prepared from a previous edition of the book, so it may not agree completely with the current edition
• However, don’t worry• There will be no test questions, for example,
which ask for detailed specifics of segmented addressing
• Any questions on this topic will be general or conceptual
224
• Some information about Intel addressing• The maximum size of a segment is 4GB (232)• The size of a page is 4KB (212)• That means a segment may consist of up to 220 or
1M of pages• The maximum number of segments per process is
16K (214)• This means in theory, if all maxima applied, a
process could have a large address space (246)
225
• The logical address space of a process is divided into two partitions, each of up to 8K segments
• Partition 1 is private to the process. – Information about its segments are stored in the
local descriptor table• Partition 2 contains segments shared among
processes. – Information about these segments is stored in the
global descriptor table
226
• The first part of a logical address is known as a selector
• It consists of these parts:– 13 bits for segment id, s– 1 bit for global vs local, g– 2 bits for protections– 16 bits
227
• Within each segment, an address is paged• It takes two levels to hold the page table• The page address takes the form described
earlier:– 10 bits for outer page of page table– 10 bits for inner page of page table– 12 bits for offset– (At 4 bytes per page table entry, you can fit 210
entries into a 4KB page)
228
• Notice that you’ve got both 14 bits for segment id + global vs. local and 32 bits for page id
• This means that in a 32 bit architecture you can’t simultaneously have the maximum number of segments and the maximum number of pages
• There is a limit on how many segments total you can have, but there is flexibility in where they’re located in memory
229
• The diagram shown on the next overhead is supposed to summarize how a segmented logical address is resolved to a physical address
• Read it and weep• In between your tears, remember, you will not
have to know this for a test
230
231
The End