paging, page tables, and such andrew whitaker cse451
TRANSCRIPT
![Page 1: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/1.jpg)
Paging, Page Tables, and Such
Andrew Whitaker
CSE451
![Page 2: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/2.jpg)
Today’s Topics
Page Replacement StrategiesMaking Paging FastReducing the Overhead of Page Tables
![Page 3: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/3.jpg)
Review: Working Sets
Number of page frames allocated to process
Req
uest
/ s
econ
d of
thr
ough
put working
set
thrashing)-:
Over-allocation
![Page 4: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/4.jpg)
Page Replacement
What happens when we take a page fault and we’ve run out of memory?
Goal: Keep each process’s working set in memory Giving more than the working set not necessary
Key issue: how do we identify working sets?
![Page 5: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/5.jpg)
Belady’s Algorithm
Evict the page that won’t be used for the longest time in the future This page is probably not in the working set
If it is in the working set, we’re thrashing
This is optimal! Minimizes the number of page faults
Major problem: this requires a crystal ball There is no good way to predict future memory
accesses
![Page 6: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/6.jpg)
How Good are These Page Replacement Algorithms?LIFO
Newest page is kicked outFIFO
Oldest page is kicked outRandom
Random page is kicked outLRU
Least recently used page is kicked out
![Page 7: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/7.jpg)
Temporal Locality
Assumption: recently accessed pages will be accessed again soon Use the past to predict the future
LIFO is horrendous Random is also pretty bad
LRU is pretty good FIFO is mediocre
VAX VMS used a form of FIFO because of hardware limitations
![Page 8: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/8.jpg)
Implementing LRU: Approach #1
One (bad) approach:
on each memory reference: long timeStamp = System.currentTimeMillis(); sortedList.insert(pageFrameNumber,timeStamp);
Problem: this is too inefficient Time stamp + data structure manipulation on each memory operation
Too complex for hardware
![Page 9: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/9.jpg)
Making LRU Efficient
Use hardware support Reference bit is set when pages are accessed Can be cleared by the OS
Trade off accuracy for speed It suffices to find a “pretty old” page
page frame numberprotMRV
202111
![Page 10: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/10.jpg)
Approach #2: LRU Approximation with Reference Bits
For each page, maintain a set of reference bits Let’s call it a reference byte
Periodically, shift the HW reference bit into the highest-order bit of the reference byte
Suppose the reference byte was 10101010 If the HW bit was set, the new reference bit become
11010101
Frame with the lowest value is the LRU page
![Page 11: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/11.jpg)
Analyzing Reference Bits
Pro: Does not impose overhead on every memory reference Interval rate can be configured
Con: Scanning all page frames can still be inefficient e.g., 4 GB of memory, 4KB pages => 1 million
page frames
![Page 12: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/12.jpg)
Approach #3: LRU Clock
Use only a single bit per page frame Basically, this is a degenerate form of reference
bits
On page eviction: Scan through the list of reference bits If the value is zero, replace this page If the value is one, set the value to zero
![Page 13: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/13.jpg)
Why “Clock”?
0
0
1 0
0
1
1
00
0
0
1
Typically implemented with a circular queue
![Page 14: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/14.jpg)
Analyzing Clock
Pro: Very low overhead Only runs when a page needs evicted Takes the first page that hasn’t been referenced
Con: Isn’t very accurate (one measly bit!) Degenerates into FIFO if all reference bits are set
Pro: But, the algorithm is self-regulating If there is a lot of memory pressure, the clock runs more
often (and is more up-to-date)
![Page 15: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/15.jpg)
When Does LRU Do Badly?
LRU performs poorly when there is little temporal locality:
1 2 3 4 5 6 7 8
Example: Many database workloads:
SELECT *FROM EmployeesWHERE Salary < 25000
![Page 16: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/16.jpg)
Today’s Topics
Page Replacement StrategiesMaking Paging FastReducing the Overhead of Page Tables
![Page 17: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/17.jpg)
Review: Mechanics of address translation
pageframe 0
pageframe 1
pageframe 2
pageframe Y
…
pageframe 3
physical memory
offset
physical address
page frame #page frame #
page table
offset
virtual address
virtual page #
Problem: page tables live in memory
![Page 18: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/18.jpg)
Making Paging Fast
We must avoid a page table lookup for every memory reference This would double memory access time
Solution: Translation Lookaside Buffer Fancy name for a cache
TLB stores a subset of PTEs (page table translation entries)
TLBs are small and fast (16-48 entries) Can be accessed “for free”
![Page 19: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/19.jpg)
TLB Details
In practice, most (> 99%) of memory translations handled by the TLB
Each processor has its own TLB TLB is fully associative
Any TLB slot can hold any PTE entry The full VPN is the cache “key” All entries are searched in parallel
Who fills the TLB? Two options: Hardware (x86) walks the page table on a TLB miss Software (MIPS, Alpha) routine fills the TLB on a miss
TLB itself needs a replacement policy Usually implemented in hardware (LRU)
![Page 20: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/20.jpg)
What Happens on a Context Switch?
Each process has its own address spaceSo, each process has its own page tableSo, page-table entries are only relevant for
a particular process Thus, the TLB must be flushed on a
context switch This is why context switches are so expensive
![Page 21: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/21.jpg)
Ben’s Idea
We can avoid flushing the TLB if entries are associated with an address space
When would this work well? When would this not work well?
page frame numberprotMRV
202111
ASID
4
![Page 22: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/22.jpg)
TLB Management Pain
TLB is a cache of page table entries OS must ensure that page tables and TLB
entries stay in sync Massive pain: TLB consistency across multiple
processors
Q: How do we implement LRU if reference bits are stored in the TLB?
One answer: we don’t Windows uses FIFO for multiprocessor machines
![Page 23: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/23.jpg)
Today’s Topics
Page Replacement StrategiesMaking Paging FastReducing the Overhead of Page Tables
![Page 24: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/24.jpg)
Page Table Overhead
For large address space, page table sizes can become enormous
Example: Alpha architecture 64 bit address space, 8KB pages
Num PTEs = 2^64 / 2^13 = 2^51
Assuming 8 bytes per PTE:Num Bytes = 2^54 = 16 Petabytes
And, this is per-process!
![Page 25: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/25.jpg)
Optimizing for Sparse Address Spaces Observation: very little of the address space is in
use at a given time This is why virtual memory works
Basic idea: only allocate page tables where we need to And, fill in new page tables on demand
virtualaddressspace
![Page 26: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/26.jpg)
Implementing Sparse Address SpacesWe need a data structure to keep track of
the page tables we have allocatedAnd, this structure must be small
Otherwise, we’ve defeated our original goal
Solution: multi-level page tables Page tables of page tables “Any problem in CS can be solved with a layer
of indirection”
![Page 27: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/27.jpg)
Two level page tables
pageframe 0
pageframe 1
pageframe 2
pageframe Y
…
pageframe 3
physical memory
offset
physical address
page frame #
masterpage table
secondary page#
virtual address
master page # offset
secondarypage table
empty
empty
secondarypage table
page framenumber
Key point: not all secondary page tables must be allocated
![Page 28: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/28.jpg)
Generalizing
Early architectures used 1-level page tablesVAX, x86 used 2-level page tablesSPARC uses 3-level page tablesAlpha 68030 uses 4-level page tables
Key thing is that the outer level must be wired down (pinned in physical memory) in order to break the recursion
![Page 29: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/29.jpg)
Cool Paging Tricks
Basic Idea: exploit the layer of indirection between virtual and physical memory
![Page 30: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/30.jpg)
Trick #1: Shared Memory
Allow different processes to share physical memory
Virt Address space 2Virt Address space 1
Physical memory
![Page 31: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/31.jpg)
Trick #2: Copy-on-write
Recall that fork() copies the parent’s address space to the client This is ineffient, especially if the child calls exec
Copy-on-write allows for a fast “copy” by using shared pages
If the child tries to write to a page, the OS intervenes and makes a copy of the target page Implementation: pages are shared as “read-only”
OS intercepts write faults
page frame numberprotMRV
![Page 32: Paging, Page Tables, and Such Andrew Whitaker CSE451](https://reader036.vdocument.in/reader036/viewer/2022062417/5519b1295503466f578b45ce/html5/thumbnails/32.jpg)
Trick #3: Memory-mapped Files
Normally, files are accessed with system calls Open, read, write, close
Memory mapping allows a program to access a file with load/store operations
Virt Address space
Foo.txt