memory hy225 lecture 12: dram and virtualhy225/11a/handouts/lecture_vm-handout2u… · hy225...
TRANSCRIPT
HY225 Lecture 12: DRAM and VirtualMemory
Dimitrios S. Nikolopoulos
University of Crete and FORTH-ICS
May 16, 2011
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 1 / 36
DRAM
Fundamentals
I Random-access memory using one transistor-capacitorpair per bit
I Capacitors leak, needs refreshI Composed of one or more memory arrays
I Organized in rows and columnsI Need sense amplifiers to compensate for voltage swing
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 2 / 36
DRAM cell
Memory array
Sense amplifiers
Column decoder
Data in/out buffers
Row
dec
oder
…columns… …
row
s…
bit line
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 3 / 36
DRAM
Fundamentals
I Each DRAM memory array outputs one bitI DRAMS use multiple memory arrays to output multiple bits
at a timeI ×N indicates DRAM with N memory arraysI ×16, ×32 DRAMS typical today
I Each collection of ×N arrays forms a DRAM bankI Banks can be read/written independently
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 4 / 36
×4 DRAM
Memory array
Sense amplifiers
Column decoder
Data in/out buffers
Row
dec
oder
bit line Memory array
Sense amplifiers
Column decoder
Data in/out buffers
Row
dec
oder
bit line Memory array
Sense amplifiers
Column decoder
Data in/out buffers
Row
dec
oder
bit line Memory array
Sense amplifiers
Column decoder
Data in/out buffers
Row
dec
oder
bit line
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 5 / 36
Interleaved DRAM
DRAM memory bandwidth
I Limited bandwidth from one DRAM bankI Increase bandwidth by delivering data from multiple banks
I Processor DRAM interconnect (e.g. bus) has higher clockfrequency than any one DRAM
I Bus control switches between multiple DRAM banks toachieve high data rate
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 6 / 36
DIMMs and Ranks
Memory array Memory array Memory array Memory array
Ι/Ο
ΜUX
One DRAM, eight internal banks, shared I/O link
one bank, x4 array
One DIMM, with one DRAM rank
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 7 / 36
Modern DRAM organization
Hierarchy of DRAM memories
I A system has multiple DIMMsI Each DIMM has multiple DRAM devices in one or more
ranksI Each DRAM device has multiple banksI Each bank has multiple memory arraysI Concurrency in ranks and banks increases memory
bandwidth
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 8 / 36
DRAM Performance
I DRAM typically connected to processor via fixed-widthclocked bus
I Bus clocks tend to be slower than CPU clocksI Cache block read with no optimization:
I Assume 1 bus cycle for address transferI 15 bus cycles per DRAM access to fetch a wordI 1 bus arbitration cycle per word data transfer
I 4-word block, 1 word-wide DRAMI Miss penalty = 1 + 4 × 15 + 4 × 1 = 65 bus cyclesI Bandwidth = 16 bytes / 65 bus cycles = 0.25 bytes / bus
cycle
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 9 / 36
Improving DRAM bandwidth
I Wide (4-word in example) DRAMI Miss penalty = 1 + 15 + 1 = 17 bus cycles
I 4-bank interleaved memoryI Miss penalty = 1 + 15 + 4 × 1 = 20 bus cycles
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 10 / 36
Virtual Memory
I Use main memory as a cache for secondary (disk, flash)storage
I Managed jointly by processor and operating systemI Programs share main memory
I With VM, each program gets a private virtual addressspace, with the entire range of addresses available on theprocessor. Processor and OS try to keep the mostfrequently used parts of code and data in DRAM
I Primary motivation for VM is protection. VM forbidsprograms from accessing the physical code and dataaddresses of other programs unless sharing is requestedby programs and approved by the OS
I Processor and OS translate virtual addresses to physicaladdresses
I VM block is called a pageI VM translation miss is called a page fault
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 11 / 36
Address Translation
I Fixed-size pages (e.g. 4K)
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 12 / 36
Page Tables
I Table storing page placement information in physicalmemory
I Array of page table entries, indexed by virtual page numberI Page table is itself stored in physical memoryI Processor register points to beginning of page table in
physical memoryI If page is present in memory
I Page table entry (PTE) stores physical page numberI Plus other status bits (valid, referenced, dirty)
I If page is not presentI PTE points to location on disk (or swap space, or other
non-volatile storage medium) where page is stored
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 13 / 36
Translation using a Page Table
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 14 / 36
Mapping Pages to Storage
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 15 / 36
Replacement and Writes
I To reduce page fault rate, VM systems preferapproximations of the least-recently used (LRU)replacement policy
I Reference bit (aka use bit) per PTE set to 1 on access tothe page
I Periodically cleared to 0 by the OSI A page with reference bit = 0 has not been used recently
I Disk writes may take millions of clock cyclesI VM needs to write entire block at once, not individual words
or bytesI Write through is impracticalI Use write-backI Dirty bit in PTE set when page is written
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 16 / 36
Problems of Single-Level Page Tables
I Page table maps all pages in the address spaceregardless of whether they are used to not
I In practice many programs use only a small fraction of theaddress space
I Problem can be solved by keeping multiple page tables,each mapping a region (e.g. 4 MB) of virtual memory
I Most of the time programs occupy the top part of theaddress space for code and data and the bottom part forstack
I Smaller page tables linked together with a parent “master”table
I Organization called multi-level page table
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 17 / 36
Two-Level Page Table
121010
31 24 23 16 15 8 7 0
page directory
physical page number page offsetsecond level base addr.
virtual page number
second−level page table
1024 entries x 4K = 4M coverage
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 18 / 36
Three-Level Page Table
16
1616
16
page offset
0
second level base addr.
63 48 47 32
page directory
65536 entries (3−L base)65536 entries (2−L base)
31 16
third level base addr.
15
virtual page number
physical page number
second−level page table
64−KB pages
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 19 / 36
Inverted Page Table
I Physical memory size typically much smaller than virtualmemory size
I Use one entry per physical page frame instead of virtualpage
I Problem: given a virtual page number how does thesystem find the mapping?
I Solutions: linear search, hash search
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 20 / 36
Linear Inverted Page Table
Virtual Page Number Offset
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
V
V
V
V
V
V
V
V
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 21 / 36
Hashed Inverted Page Table
Hash Table
Virtual Page Number Offset
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
Virtual Page NumberPhysical Page Number
V
V
V
V
V
V
V
V
Hash Function
PPN
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 22 / 36
Fast Translation using a TLB
I Address translation with a page table in memory requiresadditional memory references
I One to access the PTEI One for the actual memory access
I Access to page table often has good localityI Use a fast cache of PTEs within the processorI Fast cache called a Translation Look-aside Buffer (TLB)I Typical size: 16–512 entries, 0.5–1 cycle hit 10-100 cycles
for miss, 0.01%–1% miss rateI Misses can be handled by hardware or software
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 23 / 36
Fast Translation using a TLB
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 24 / 36
TLB Misses
I If page is in memoryI Load PTE from memory and retry translationI Operation can be handled in hardware
I Hardware complexity increases with multi-level page tables,inverted page tables
I Operation can be handled in softwareI Translation miss raises exception, operating system handler
restores translationI If page is not in memory, page fault
I Operating system handles fetching of the page fromsecondary storage (e.g. disk) and updating the page table
I Execution of program restarts at faulting instruction
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 25 / 36
TLB Miss Handler
I TLB miss indicatesI Page present, but PTE not in TLBI Page not present
I Must recognize TLB miss before destination register isoverwritten
I Raise exceptionI Handler copies PTE from memory to TLB
I Then restarts instructionI If page not present, page fault will occur
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 26 / 36
Page Fault Handler
I Use faulting virtual address to find PTEI Locate page on secondary storage (e.g. disk)I Choose page to replace
I If dirty, write to disk firstI Read page into memory and update page tableI Make process runnable again
I Restart from faulting instruction
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 27 / 36
TLB and Cache Interaction
I If cache tag uses physicaladdress
I Need to translate beforecache lookup
I Alternative: use virtualaddress
I Complications due toaliasing
I Different virtualaddresses may be usedfor the shared physicaladdress
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 28 / 36
Kernel and User ModeI Hardware needs to have a way to locate the page table of
each programI Hardware uses special register, not visible to programs for
this purposeI The register for page table indexing must only be
accessible to the OS, not user programsI How are certain hardware structures protected from user
access?I System operates in two modes: user mode – where user
programs run and kernel mode – where the operatingsystem runs
I Protected hardware structures modified only in kernel modeI Mode switching
I Processor switches from user to kernel mode at:I System calls, exceptions, interruptsI Operating system calls exception handler for each case
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 29 / 36
Protecting Access to MemoryI Operating system can control access rights to memory for
each pageI Each page can be designated as read-only, writeable, or
executable (rwx)I Code pages are typically read-only and executableI Data pages are read-only or read-writeI Each user program may request to change permission
accesses to its own pagesI Any request to change permission must be made to the
OS, OS grants or rejects requestI Control bits stored in PTE
I Permission bits (rwx)I Valid bit (page present in memory)I Dirty bit (page written since it was brought in memory, used
to write-back on replacement)I Reference bit(s) (used to implement replacement policy, as
an approximation to LRU)
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 30 / 36
TLB and Multiprogramming
I On a multiprogramming system the operating systemperforms process switching
I Each user program executes in a process with a privateaddress space
I Operating system may provide the illusion that programsexecute concurrently on the same processor bytime-multiplexing the processor between processes
I What happens to contents of TLB when the operatingsystem switches processes?
I First solution is TLB flushing: the operating system clearsthe contents of the TLB so that the new process starts withan empty TLB
I Second solution is TLB sharing between processes: TLBhardware has a process identifier attached to each TLBentry. Process identifier enables differentiation betweenpage translations of different processes.
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 31 / 36
Sharing and Protection
I In several cases it is beneficial for processes to sharephysical memory
I Example: two instances of a web browser belonging todifferent users (different address spaces). Both instancesexecute the same code, therefore replicating the code inphysical memory is not economical
I Operating system implements sharing by mapping(potentially different) virtual pages of different processes tothe same physical page
I Second example: two processes of the same user need toshare data (e.g. a software pipeline)
I User may request from the operating system memory anddesignated it as shared with other processes
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 32 / 36
Virtual Machines
I Host computer emulates multiple guest operating systemsand machine resources
I Host can run multiple guest operating systemssimultaneously and achieve isolation between guests
I Avoids security and reliability problemsI Aids sharing of hardware resources between users with a
custom software environment eachI Virtualization has a performance impact
I Privileged operations of guest operating systems areemulated by a virtual machine monitor as opposed to beexecuted natively
I ExamplesI VMWareI Xen
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 33 / 36
Virtual Machine Monitor
I VMM maps virtual hardware resources accessed by guestvirtual machines to physical hardware resources
I Memory, I/O devices, CPUsI Guest code running at user level runs natively on the
processorI Privileged guest instructions trap to VMM to access
protected resourcesI Guest OS may be different from host OSI VMM handles I/O devices
I Emulates I/O devices as generic virtual files accessiblefrom guests
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 34 / 36
Example: Timer Virtualization
I In native machine, on timer interruptI OS suspends current process, handles interrupt, selects
and resumes next process (context switching)I With VMM
I VMM suspends current VM, handles interrupt, selects andresumes next VM
I If a VM requires timer interruptsI VMM emulates a virtual timerI Emulates interrupt for VM when physical timer interrupt
occurs
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 35 / 36
Instruction Set Support
I User and System ModesI Privileged instructions only available in system mode
I Trap to system if executed in user modeI All physical resources only accessible through privileged
instructionsI Includes access to page table registers, interrupt control,
I/O registersI Virtualization support begins to appear in common RISC
and CISC ISAs
Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 36 / 36