memory hy225 lecture 12: dram and virtualhy225/11a/handouts/lecture_vm-handout2u… · hy225...

18
HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 1 / 36 DRAM Fundamentals Random-access memory using one transistor-capacitor pair per bit Capacitors leak, needs refresh Composed of one or more memory arrays Organized in rows and columns Need sense amplifiers to compensate for voltage swing Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 2 / 36

Upload: others

Post on 27-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

HY225 Lecture 12: DRAM and VirtualMemory

Dimitrios S. Nikolopoulos

University of Crete and FORTH-ICS

May 16, 2011

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 1 / 36

DRAM

Fundamentals

I Random-access memory using one transistor-capacitorpair per bit

I Capacitors leak, needs refreshI Composed of one or more memory arrays

I Organized in rows and columnsI Need sense amplifiers to compensate for voltage swing

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 2 / 36

Page 2: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

DRAM cell

Memory array

Sense amplifiers

Column decoder

Data in/out buffers

Row

dec

oder

…columns… …

row

s…

bit line

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 3 / 36

DRAM

Fundamentals

I Each DRAM memory array outputs one bitI DRAMS use multiple memory arrays to output multiple bits

at a timeI ×N indicates DRAM with N memory arraysI ×16, ×32 DRAMS typical today

I Each collection of ×N arrays forms a DRAM bankI Banks can be read/written independently

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 4 / 36

Page 3: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

×4 DRAM

Memory array

Sense amplifiers

Column decoder

Data in/out buffers

Row

dec

oder

bit line Memory array

Sense amplifiers

Column decoder

Data in/out buffers

Row

dec

oder

bit line Memory array

Sense amplifiers

Column decoder

Data in/out buffers

Row

dec

oder

bit line Memory array

Sense amplifiers

Column decoder

Data in/out buffers

Row

dec

oder

bit line

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 5 / 36

Interleaved DRAM

DRAM memory bandwidth

I Limited bandwidth from one DRAM bankI Increase bandwidth by delivering data from multiple banks

I Processor DRAM interconnect (e.g. bus) has higher clockfrequency than any one DRAM

I Bus control switches between multiple DRAM banks toachieve high data rate

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 6 / 36

Page 4: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

DIMMs and Ranks

Memory array Memory array Memory array Memory array

Ι/Ο

ΜUX

One DRAM, eight internal banks, shared I/O link

one bank, x4 array

One DIMM, with one DRAM rank

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 7 / 36

Modern DRAM organization

Hierarchy of DRAM memories

I A system has multiple DIMMsI Each DIMM has multiple DRAM devices in one or more

ranksI Each DRAM device has multiple banksI Each bank has multiple memory arraysI Concurrency in ranks and banks increases memory

bandwidth

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 8 / 36

Page 5: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

DRAM Performance

I DRAM typically connected to processor via fixed-widthclocked bus

I Bus clocks tend to be slower than CPU clocksI Cache block read with no optimization:

I Assume 1 bus cycle for address transferI 15 bus cycles per DRAM access to fetch a wordI 1 bus arbitration cycle per word data transfer

I 4-word block, 1 word-wide DRAMI Miss penalty = 1 + 4 × 15 + 4 × 1 = 65 bus cyclesI Bandwidth = 16 bytes / 65 bus cycles = 0.25 bytes / bus

cycle

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 9 / 36

Improving DRAM bandwidth

I Wide (4-word in example) DRAMI Miss penalty = 1 + 15 + 1 = 17 bus cycles

I 4-bank interleaved memoryI Miss penalty = 1 + 15 + 4 × 1 = 20 bus cycles

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 10 / 36

Page 6: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Virtual Memory

I Use main memory as a cache for secondary (disk, flash)storage

I Managed jointly by processor and operating systemI Programs share main memory

I With VM, each program gets a private virtual addressspace, with the entire range of addresses available on theprocessor. Processor and OS try to keep the mostfrequently used parts of code and data in DRAM

I Primary motivation for VM is protection. VM forbidsprograms from accessing the physical code and dataaddresses of other programs unless sharing is requestedby programs and approved by the OS

I Processor and OS translate virtual addresses to physicaladdresses

I VM block is called a pageI VM translation miss is called a page fault

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 11 / 36

Address Translation

I Fixed-size pages (e.g. 4K)

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 12 / 36

Page 7: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Page Tables

I Table storing page placement information in physicalmemory

I Array of page table entries, indexed by virtual page numberI Page table is itself stored in physical memoryI Processor register points to beginning of page table in

physical memoryI If page is present in memory

I Page table entry (PTE) stores physical page numberI Plus other status bits (valid, referenced, dirty)

I If page is not presentI PTE points to location on disk (or swap space, or other

non-volatile storage medium) where page is stored

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 13 / 36

Translation using a Page Table

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 14 / 36

Page 8: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Mapping Pages to Storage

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 15 / 36

Replacement and Writes

I To reduce page fault rate, VM systems preferapproximations of the least-recently used (LRU)replacement policy

I Reference bit (aka use bit) per PTE set to 1 on access tothe page

I Periodically cleared to 0 by the OSI A page with reference bit = 0 has not been used recently

I Disk writes may take millions of clock cyclesI VM needs to write entire block at once, not individual words

or bytesI Write through is impracticalI Use write-backI Dirty bit in PTE set when page is written

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 16 / 36

Page 9: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Problems of Single-Level Page Tables

I Page table maps all pages in the address spaceregardless of whether they are used to not

I In practice many programs use only a small fraction of theaddress space

I Problem can be solved by keeping multiple page tables,each mapping a region (e.g. 4 MB) of virtual memory

I Most of the time programs occupy the top part of theaddress space for code and data and the bottom part forstack

I Smaller page tables linked together with a parent “master”table

I Organization called multi-level page table

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 17 / 36

Two-Level Page Table

121010

31 24 23 16 15 8 7 0

page directory

physical page number page offsetsecond level base addr.

virtual page number

second−level page table

1024 entries x 4K = 4M coverage

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 18 / 36

Page 10: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Three-Level Page Table

16

1616

16

page offset

0

second level base addr.

63 48 47 32

page directory

65536 entries (3−L base)65536 entries (2−L base)

31 16

third level base addr.

15

virtual page number

physical page number

second−level page table

64−KB pages

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 19 / 36

Inverted Page Table

I Physical memory size typically much smaller than virtualmemory size

I Use one entry per physical page frame instead of virtualpage

I Problem: given a virtual page number how does thesystem find the mapping?

I Solutions: linear search, hash search

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 20 / 36

Page 11: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Linear Inverted Page Table

Virtual Page Number Offset

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

V

V

V

V

V

V

V

V

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 21 / 36

Hashed Inverted Page Table

Hash Table

Virtual Page Number Offset

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

Virtual Page NumberPhysical Page Number

V

V

V

V

V

V

V

V

Hash Function

PPN

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 22 / 36

Page 12: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Fast Translation using a TLB

I Address translation with a page table in memory requiresadditional memory references

I One to access the PTEI One for the actual memory access

I Access to page table often has good localityI Use a fast cache of PTEs within the processorI Fast cache called a Translation Look-aside Buffer (TLB)I Typical size: 16–512 entries, 0.5–1 cycle hit 10-100 cycles

for miss, 0.01%–1% miss rateI Misses can be handled by hardware or software

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 23 / 36

Fast Translation using a TLB

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 24 / 36

Page 13: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

TLB Misses

I If page is in memoryI Load PTE from memory and retry translationI Operation can be handled in hardware

I Hardware complexity increases with multi-level page tables,inverted page tables

I Operation can be handled in softwareI Translation miss raises exception, operating system handler

restores translationI If page is not in memory, page fault

I Operating system handles fetching of the page fromsecondary storage (e.g. disk) and updating the page table

I Execution of program restarts at faulting instruction

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 25 / 36

TLB Miss Handler

I TLB miss indicatesI Page present, but PTE not in TLBI Page not present

I Must recognize TLB miss before destination register isoverwritten

I Raise exceptionI Handler copies PTE from memory to TLB

I Then restarts instructionI If page not present, page fault will occur

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 26 / 36

Page 14: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Page Fault Handler

I Use faulting virtual address to find PTEI Locate page on secondary storage (e.g. disk)I Choose page to replace

I If dirty, write to disk firstI Read page into memory and update page tableI Make process runnable again

I Restart from faulting instruction

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 27 / 36

TLB and Cache Interaction

I If cache tag uses physicaladdress

I Need to translate beforecache lookup

I Alternative: use virtualaddress

I Complications due toaliasing

I Different virtualaddresses may be usedfor the shared physicaladdress

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 28 / 36

Page 15: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Kernel and User ModeI Hardware needs to have a way to locate the page table of

each programI Hardware uses special register, not visible to programs for

this purposeI The register for page table indexing must only be

accessible to the OS, not user programsI How are certain hardware structures protected from user

access?I System operates in two modes: user mode – where user

programs run and kernel mode – where the operatingsystem runs

I Protected hardware structures modified only in kernel modeI Mode switching

I Processor switches from user to kernel mode at:I System calls, exceptions, interruptsI Operating system calls exception handler for each case

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 29 / 36

Protecting Access to MemoryI Operating system can control access rights to memory for

each pageI Each page can be designated as read-only, writeable, or

executable (rwx)I Code pages are typically read-only and executableI Data pages are read-only or read-writeI Each user program may request to change permission

accesses to its own pagesI Any request to change permission must be made to the

OS, OS grants or rejects requestI Control bits stored in PTE

I Permission bits (rwx)I Valid bit (page present in memory)I Dirty bit (page written since it was brought in memory, used

to write-back on replacement)I Reference bit(s) (used to implement replacement policy, as

an approximation to LRU)

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 30 / 36

Page 16: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

TLB and Multiprogramming

I On a multiprogramming system the operating systemperforms process switching

I Each user program executes in a process with a privateaddress space

I Operating system may provide the illusion that programsexecute concurrently on the same processor bytime-multiplexing the processor between processes

I What happens to contents of TLB when the operatingsystem switches processes?

I First solution is TLB flushing: the operating system clearsthe contents of the TLB so that the new process starts withan empty TLB

I Second solution is TLB sharing between processes: TLBhardware has a process identifier attached to each TLBentry. Process identifier enables differentiation betweenpage translations of different processes.

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 31 / 36

Sharing and Protection

I In several cases it is beneficial for processes to sharephysical memory

I Example: two instances of a web browser belonging todifferent users (different address spaces). Both instancesexecute the same code, therefore replicating the code inphysical memory is not economical

I Operating system implements sharing by mapping(potentially different) virtual pages of different processes tothe same physical page

I Second example: two processes of the same user need toshare data (e.g. a software pipeline)

I User may request from the operating system memory anddesignated it as shared with other processes

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 32 / 36

Page 17: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Virtual Machines

I Host computer emulates multiple guest operating systemsand machine resources

I Host can run multiple guest operating systemssimultaneously and achieve isolation between guests

I Avoids security and reliability problemsI Aids sharing of hardware resources between users with a

custom software environment eachI Virtualization has a performance impact

I Privileged operations of guest operating systems areemulated by a virtual machine monitor as opposed to beexecuted natively

I ExamplesI VMWareI Xen

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 33 / 36

Virtual Machine Monitor

I VMM maps virtual hardware resources accessed by guestvirtual machines to physical hardware resources

I Memory, I/O devices, CPUsI Guest code running at user level runs natively on the

processorI Privileged guest instructions trap to VMM to access

protected resourcesI Guest OS may be different from host OSI VMM handles I/O devices

I Emulates I/O devices as generic virtual files accessiblefrom guests

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 34 / 36

Page 18: Memory HY225 Lecture 12: DRAM and Virtualhy225/11a/handouts/lecture_vm-handout2u… · HY225 Lecture 12: DRAM and Virtual Memory Dimitrios S. Nikolopoulos University of Crete and

Example: Timer Virtualization

I In native machine, on timer interruptI OS suspends current process, handles interrupt, selects

and resumes next process (context switching)I With VMM

I VMM suspends current VM, handles interrupt, selects andresumes next VM

I If a VM requires timer interruptsI VMM emulates a virtual timerI Emulates interrupt for VM when physical timer interrupt

occurs

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 35 / 36

Instruction Set Support

I User and System ModesI Privileged instructions only available in system mode

I Trap to system if executed in user modeI All physical resources only accessible through privileged

instructionsI Includes access to page table registers, interrupt control,

I/O registersI Virtualization support begins to appear in common RISC

and CISC ISAs

Dimitrios S. Nikolopoulos Lecture 12: DRAM and Virtual Memory 36 / 36