1 above: the burrough b5000 computer. the first commercial machine with virtual memory. right: first...
Post on 21-Dec-2015
214 views
TRANSCRIPT
1
Above: The Burrough B5000 computer. The first commercial machine with virtual memory.
Right: First experimental virtual memory. The Manchester Atlas computer, which had virtual memory backed on a magnetic drum.
2
COMP 740:COMP 740:Computer Architecture and Computer Architecture and ImplementationImplementation
Montek SinghMontek Singh
Thu, Apr 23, 2009Thu, Apr 23, 2009
Topic: Topic: Virtual MemoryVirtual Memory
33
Virtual Memory (App. C)Virtual Memory (App. C)Several purposes:Several purposes: Main: Allowing software to address more than Main: Allowing software to address more than
physical memoryphysical memory
Other benefits:Other benefits: Provides for protection; facilitates multi-processingProvides for protection; facilitates multi-processing Enables relocationEnables relocation Enables programs to begin before loading fully (some Enables programs to begin before loading fully (some
implementations)implementations) Programmers used to use Programmers used to use overlaysoverlays and manually and manually
control loading/unloadingcontrol loading/unloading
4
CharacteristicsCharacteristics
Cache-MM MM-diskAccess time ratio ("speed gap") 1:5 - 1:15 1:10000 - 1:1000000
Hit time 1-2 cycles 40-100 cyclesHit ratio 0.90-0.99 0.99999-0.9999999
Miss (page fault) ratio 0.01-0.10 0.00000001-0.000001Miss penalty 10-100 cycles 1M-6M cycles
CPU during block transfer blocking/non-blocking task switchingBlock (page) size 16-128 bytes 4Kbytes - 64KbytesImplemented in hardware hardware + software
Mapping Direct or set-associative Page table ("fully associative")Replacement algorithm Not crucial Very important (LRU)
Write policy Many choices Write backDirect access to slow memory Yes No
5
Segmentation and PagingSegmentation and Paging Paging system has flat, linear address spacePaging system has flat, linear address space
32-bit VA = (10-bit VPN1, 10-bit VPN2, 12-bit offset)32-bit VA = (10-bit VPN1, 10-bit VPN2, 12-bit offset) If, for given VPN1, we reach max value of VPN2 and add 1, If, for given VPN1, we reach max value of VPN2 and add 1,
we reach next page at address (VPN+1, 0)we reach next page at address (VPN+1, 0) Segmented version has two-dimensional address Segmented version has two-dimensional address
spacespace 32-bit VA = (10-bit segment #, 10-bit page number, 12-bit 32-bit VA = (10-bit segment #, 10-bit page number, 12-bit
offset)offset) If, for given segment #, we reach max page number and add 1, If, for given segment #, we reach max page number and add 1,
we get an undefined valuewe get an undefined value Segments are not contiguousSegments are not contiguous Segments do not need to have the same sizeSegments do not need to have the same size
Size can even vary dynamicallySize can even vary dynamically Implemented by storing upper bound for each segment and Implemented by storing upper bound for each segment and
checking every reference against itchecking every reference against it
Pure segmentation not used todayPure segmentation not used today However, variable page sizes have been used to get However, variable page sizes have been used to get
some of the locality advantages of segmentationsome of the locality advantages of segmentation
6
AddressingAddressing Always a “congruence mapping”Always a “congruence mapping” Assume Assume
4GB VM composed of 24GB VM composed of 22020 4KB pages 4KB pages 64MB DRAM main memory composed of 16384 page frames (of 64MB DRAM main memory composed of 16384 page frames (of
same size)same size) Only those pages (of the 2Only those pages (of the 22020) that are not empty actually exist) that are not empty actually exist
Each is either in main memory or on diskEach is either in main memory or on disk Can be located with two mappings (implemented with tables)Can be located with two mappings (implemented with tables)
Virtual address = (virtual page number, page offset)VA = (VPN, offset)32 bits = (20 bits + 12 bits)
Physical address = (real page number, page offset)PA = (RPN, offset)26 bits = (14 bits + 12 bits)
7
Address TranslationAddress Translation
RPN = fRPN = fMM(VPN)(VPN) In reality, VPN is mapped to a page table entry (PTE)In reality, VPN is mapped to a page table entry (PTE)
which contains RPN …which contains RPN … … … as well as miscellaneous control information (e.g., valid bit, as well as miscellaneous control information (e.g., valid bit,
dirty bit, replacement information, access control)dirty bit, replacement information, access control)
VA PA(VPN, offset within page) (RPN, offset within page)
VA disk address
8
Single-Level, Direct Page Table in Single-Level, Direct Page Table in MMMM Fully associative mapping:Fully associative mapping:
when VM page is brought in from disk to MM, it may when VM page is brought in from disk to MM, it may go into any of the real page framesgo into any of the real page frames
Simplest addressing scheme: one-level, direct Simplest addressing scheme: one-level, direct page tablepage table (page table base address + VPN) = PTE or page fault(page table base address + VPN) = PTE or page fault Assume that PTE size is 4 bytesAssume that PTE size is 4 bytes Then whole table requires 4Then whole table requires 4222020 = 4MB of main = 4MB of main
memorymemory
Disadvantage: 4MB of main memory must be Disadvantage: 4MB of main memory must be reserved for page tables, even when the VM reserved for page tables, even when the VM space is almost emptyspace is almost empty
9
Single-Level Direct Page Table in Single-Level Direct Page Table in VMVM To avoid tying down 4MB of physical memoryTo avoid tying down 4MB of physical memory
Put page tables in VMPut page tables in VM Bring into MM only those that are actually neededBring into MM only those that are actually needed ““Paging the page tables”Paging the page tables”
Needs only 1K PTEs in main memory, rather Needs only 1K PTEs in main memory, rather than 4MBthan 4MB
Slows down access to VM pages by possibly Slows down access to VM pages by possibly needing disk accesses for the PTEsneeding disk accesses for the PTEs
10
Multi-Level Direct Page Table in Multi-Level Direct Page Table in MMMM Another solution to storage problemAnother solution to storage problem Break 20-bit VPN into two 10-bit partsBreak 20-bit VPN into two 10-bit parts
VPN = (VPN1, VPN2)VPN = (VPN1, VPN2)
This turns original one-level page table into a This turns original one-level page table into a tree structuretree structure (1st level base address + VPN1) = 2nd level base (1st level base address + VPN1) = 2nd level base
addressaddress (2nd level base address + VPN2) = PTE or page fault(2nd level base address + VPN2) = PTE or page fault
Storage situation much improvedStorage situation much improved Always need root node (1K 4-byte entries = 1 VM page)Always need root node (1K 4-byte entries = 1 VM page) Ned only a few of the second level nodesNed only a few of the second level nodes
Allocated on demandAllocated on demandCan be anywhere in main memoryCan be anywhere in main memory
Negative: Access time to PTE has doubledNegative: Access time to PTE has doubled
11
Inverted Page TablesInverted Page Tables Virtual address spaces may be vastly larger Virtual address spaces may be vastly larger
(and more sparsely populated) than real (and more sparsely populated) than real address spacesaddress spaces less-than-full utilization of tree nodes in multi-level less-than-full utilization of tree nodes in multi-level
direct page table becomes more significantdirect page table becomes more significant Ideal (i.e., smallest possible) page table would Ideal (i.e., smallest possible) page table would
have one entry for every VM page actually in have one entry for every VM page actually in main memorymain memory Need 4Need 416K = 64KB of main memory to store this ideal 16K = 64KB of main memory to store this ideal
page tablepage table Storage overhead = 0.1%Storage overhead = 0.1%
Inverted page tableInverted page table implementations are implementations are approximations to this ideal page tableapproximations to this ideal page table Associative inverted page table in special hardware Associative inverted page table in special hardware
(ATLAS)(ATLAS) Hashed inverted page table in MM (IBM, HP PA-RISC)Hashed inverted page table in MM (IBM, HP PA-RISC)
12
Translation Lookaside Buffer (TLB)Translation Lookaside Buffer (TLB) To avoid two or more MM accesses for each VM To avoid two or more MM accesses for each VM
access, use a small cache to store (VPN, PTE) access, use a small cache to store (VPN, PTE) pairspairs PTE contains RPN, from which RA can be constructedPTE contains RPN, from which RA can be constructed
This cache is the TLB, and it exploits localityThis cache is the TLB, and it exploits locality DEC Alpha (32 entries, fully associative)DEC Alpha (32 entries, fully associative) Amdahl V/8 (512 entries, 2-way set-associative)Amdahl V/8 (512 entries, 2-way set-associative)
Processor issues VAProcessor issues VA TLB hitTLB hit
Send RA to main memorySend RA to main memory TLB missTLB miss
Make two or more MM accesses to page tables to retrieve Make two or more MM accesses to page tables to retrieve RARA
Send RA to MMSend RA to MM– (Any of these may cause page fault)(Any of these may cause page fault)
13
TLB MissesTLB Misses Causes for TLB missCauses for TLB miss
VM page is not in main memoryVM page is not in main memory VM page is in main memory, but TLB entry has not yet VM page is in main memory, but TLB entry has not yet
been entered into TLBbeen entered into TLB VM page is in main memory, but TLB entry has been VM page is in main memory, but TLB entry has been
removed for some reason (removed as LRU, removed for some reason (removed as LRU, invalidated because page table was updated, etc.)invalidated because page table was updated, etc.)
Miss rates are remarkably low (~0.1%)Miss rates are remarkably low (~0.1%) Miss rate depends on size of TLB and on VM page size Miss rate depends on size of TLB and on VM page size
(coverage)(coverage)
Miss penalty varies from a single cache access Miss penalty varies from a single cache access to several page faultsto several page faults
14
Dirty Bits and TLB: Two SolutionsDirty Bits and TLB: Two Solutions TLB is TLB is read-onlyread-only cache cache Dirty bit is contained Dirty bit is contained
only in page table in MMonly in page table in MM TLB contains only a TLB contains only a
write-access bitwrite-access bit Initially set to zero Initially set to zero
(denying writing of page)(denying writing of page) On first attempt to write On first attempt to write
VM pageVM page An exception is causedAn exception is caused Sets the dirty bit in page Sets the dirty bit in page
table in MMtable in MM Resets the write access Resets the write access
bit to 1 in TLBbit to 1 in TLB
TLB is a TLB is a read-writeread-write cachecache
Dirty bit present in both Dirty bit present in both TLB and page table in TLB and page table in MMMM
On first write to VM On first write to VM pagepage Only dirty bit in TLB is Only dirty bit in TLB is
setset Dirty bit in page table is Dirty bit in page table is
brought up-to-datebrought up-to-date when TLB entry is when TLB entry is
evictedevicted when VM page and PTE when VM page and PTE
are evictedare evicted
15
Virtual Memory Access TimeVirtual Memory Access Time Assume existence of TLB, physical cache, MM, Assume existence of TLB, physical cache, MM,
diskdisk Processor issues VAProcessor issues VA
TLB hitTLB hitSend RA to cacheSend RA to cache
TLB missTLB missException: Access page tables, update TLB, retryException: Access page tables, update TLB, retry
Memory reference may involve accesses toMemory reference may involve accesses to TLBTLB Page table in MMPage table in MM CacheCache Page in MMPage in MM
Each of these can be a hit or a missEach of these can be a hit or a miss 16 possible combinations16 possible combinations
16
Virtual Memory Access Time (2)Virtual Memory Access Time (2) Constraints among these accessesConstraints among these accesses
Hit in TLB Hit in TLB hit in page table in MM hit in page table in MM Hit in cache Hit in cache hit in page in MM hit in page in MM Hit in page in MM Hit in page in MM hit in page table in MM hit in page table in MM
These constraints eliminate eleven These constraints eliminate eleven combinationscombinationsCase TLB MM PTE Cache MM data Comment
Cache hit Hit (Hit) Hit (Hit) MM not checkedCache miss Hit (Hit) Miss Hit Cache updatedTLB miss Miss Hit Hit Hit TLB updated, TLB access repeatedTLB+cache miss Miss Hit Miss Hit TLB+cache updatedPage fault Miss Miss Miss Miss Cache miss follows servicing of page fault
17
Virtual Memory Access Time (3)Virtual Memory Access Time (3) Number of MM accesses depends on page Number of MM accesses depends on page
table organizationtable organization MIPS R2000/R4000 accomplishes table walking with MIPS R2000/R4000 accomplishes table walking with
CPU instructions (eight instructions per page table CPU instructions (eight instructions per page table level)level)
Several CISC machines implement this in microcode, Several CISC machines implement this in microcode, with MC88200 having dedicated hardware for thiswith MC88200 having dedicated hardware for this
RS/6000 implements this completely in hardwareRS/6000 implements this completely in hardware
TLB miss penalty dominated by having to go to TLB miss penalty dominated by having to go to main memorymain memory Page tables may not be in cachePage tables may not be in cache Further increase in miss penalty if page table Further increase in miss penalty if page table
organization is complexorganization is complex
18
Page SizePage Size ChoicesChoices
Fixed at design time (most early VM systems)Fixed at design time (most early VM systems) Statically configurableStatically configurable
At any moment, only pages of same size exist in systemAt any moment, only pages of same size exist in systemMC68030 allowed page sizes between 256B and 32KB this MC68030 allowed page sizes between 256B and 32KB this
wayway Dynamically configurableDynamically configurable
Pages of different sizes coexist in systemPages of different sizes coexist in systemAlpha 21164, UltraSPARC: 8KB, 64KB, 512KB, 4MBAlpha 21164, UltraSPARC: 8KB, 64KB, 512KB, 4MBMIPS R10000, PA-8000: 4KB, 16Kb, 64KB, 256 KB, 1 MB, 4 MIPS R10000, PA-8000: 4KB, 16Kb, 64KB, 256 KB, 1 MB, 4
MB, 16 MBMB, 16 MBAll pages are alignedAll pages are aligned
Dynamic configuration is a sophisticated way to Dynamic configuration is a sophisticated way to decrease TLB missdecrease TLB miss Increasing # TLB entries increases processor cycle timeIncreasing # TLB entries increases processor cycle time Increasing size of VM page increases internal memory Increasing size of VM page increases internal memory
fragmentationfragmentationNeeds fully associative TLBsNeeds fully associative TLBs
1919
Example 1: Alpha 21264 TLBExample 1: Alpha 21264 TLB
1. VPN is extracted2. Protections checked3. One of 40 entries muxed (or
miss registered)4. Physical page address
combined with offset to generate real address
2020
Example 2: Hypothetical Virtual Example 2: Hypothetical Virtual MemMem
Figure C.24Figure C.24 8K page size8K page size Cache block 64 Cache block 64
bytesbytes L1 8K bytesL1 8K bytes L2 4MBL2 4MB Caches are direct Caches are direct
mappedmapped L1 virtually L1 virtually
indexed and indexed and physically taggedphysically tagged
Maybe 2 sets of Maybe 2 sets of TLB and L1 (I & D)TLB and L1 (I & D)
21
THE END!THE END! Thank you for your participation in this class!Thank you for your participation in this class!
Final Exam Apr 29, 8-11am, SN155Final Exam Apr 29, 8-11am, SN155 Open book, open notesOpen book, open notes
I will post office hours for Friday and Monday I will post office hours for Friday and Monday on the class websiteon the class website
Graded project, homework in your mailbox Graded project, homework in your mailbox tomorrowtomorrow