ece 550d fundamentals of computer systems and...

45
ECE 550D Fundamentals of Computer Systems and Engineering Fall 2016 Virtual Memory Tyler Bletsch Duke University Slides are derived from work by Andrew Hilton (Duke), Dan Sorin (Duke), and Amir Roth (Penn)

Upload: dophuc

Post on 07-Mar-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

ECE 550D Fundamentals of Computer Systems and Engineering

Fall 2016

Virtual Memory

Tyler Bletsch

Duke University

Slides are derived from work by Andrew Hilton (Duke), Dan Sorin (Duke), and Amir Roth (Penn)

Page 2: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

2

DRAM Packaging

• Just talked about DRAM: here is a picture of a DIMM • E.g., 8 DRAM chips, each chip is 4 or 8 bits wide

Page 3: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

3

Where We Are in This Course Right Now

• So far: • We know how to design a processor that can fetch, decode, and

execute the instructions in an ISA

• We understand how to design caches

• We know how to implement main memory in DRAM

• Now: • We learn about virtual memory

• Next: • We learn about the lowest level of storage (disks) and I/O

Page 4: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

4

End of memory hierarchy: Virtual Memory

• Virtual memory • Address translation and page tables

• A virtual memory hierarchy

Application

OS

Firmware Compiler

I/O

Memory

Digital Circuits

Gates & Transistors

CPU

Page 5: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

5

One last problem: How to fit into Memory

• Reasonable Memory: 4GB—64GB? • 32-bit address space: 4GB/program: run 1—16 programs?

• 64-bit address space: need 16 Billion GB for 1 program?

• Not going to work

• Instead: virtual memory • Give every program the illusion of entire address space

• Hardware and OS move things around behind the scenes

• How? • Functionality problem -> add level of indirection

• Good rule to know

Page 6: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

6

Virtual Memory

• Idea of treating memory like a cache • Contents are a dynamic subset of program’s address space

• Dynamic content management is transparent to program

• Actually predates “caches” (by a little)

• Original motivation: compatibility • IBM System 370: a family of computers with one software suite

+ Same program could run on machines with different memory sizes

• Caching mechanism made it appear as if memory was 2N bytes

• Regardless of how much memory there actually was

– Prior, programmers explicitly accounted for memory size

• Virtual memory • Virtual: “in effect, but not in actuality” (i.e., appears to be, but isn’t)

Page 7: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

7

CA

CH

ING

Cache

Copy if popular

Figure: caching vs. virtual memory

7

RAM

VIR

TU

AL

ME

MO

RY

(or SSD)

Hard disk

Load if needed

Drop

• Faster

• More expensive

• Lower capacity

• Slower

• Cheaper

• Higher capacity

Swap out (RW) or drop (RO)

Page 8: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

8

Demand Paging

Page

A chunk of memory with its own record in the memory

management hardware. Often 4kB.

Page 9: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

9

Virtual Memory

• Programs use virtual addresses (VA) • 0…2N–1

• VA size also referred to as machine size

• E.g., Pentium4 is 32-bit, Itanium is 64-bit

• Memory uses physical addresses (PA) • 0…2M–1 (M<N, especially if N=64)

• 2M is most physical memory machine supports

• VA→PA at page granularity (VP→PP) • By “system”

• Mapping need not preserve contiguity

• VP need not be mapped to any PP

• Unmapped VPs live on disk (swap)

Disk(swap)

Program

Main Memory

code heap stack

Page 10: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

10

Other Uses of Virtual Memory

• Virtual memory is quite useful • Automatic, transparent memory management just one use

• “Functionality problems are solved by adding levels of indirection”

• Example: multiprogramming • Each process thinks it has 2N bytes of address space

• Each thinks its stack starts at address 0xFFFFFFFF

• “System” maps VPs from different processes to different PPs

+ Prevents processes from reading/writing each other’s memory

Program1 … Program2

Page 11: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

11

Still More Uses of Virtual Memory

• Inter-process communication • Map VPs in different processes to same PPs

• Direct memory access I/O • Think of I/O device as another process

• Will talk more about I/O in a few lectures

• Protection • Piggy-back mechanism to implement page-level protection

• Map VP to PP … and RWX protection bits

• Attempt to execute data, or attempt to write insn/read-only data?

• Exception → OS terminates program

Page 12: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

12

Address Translation

• VA→PA mapping called address translation • Split VA into virtual page number (VPN) and page offset (POFS)

• Translate VPN into physical page number (PPN)

• POFS is not translated – why not?

• VA→PA = [VPN, POFS] → [PPN, POFS]

• Example above • 64KB pages → 16-bit POFS

• 32-bit machine → 32-bit VA → 16-bit VPN (16 = 32 – 16)

• Maximum 256MB memory → 28-bit PA → 12-bit PPN

POFS[15:0] virtual address[31:0] VPN[31:16]

POFS[15:0] physical address[27:0] PPN[27:16]

translate don’t touch

Page 13: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

13

Mechanics of Address Translation

• How are addresses translated? • In software (now) but with hardware acceleration (a little later)

• Each process is allocated a page table (PT) • Maps VPs to PPs or to disk (swap) addresses

• VP entries empty if page never referenced

• Translation is table lookup

struct {

union { int ppn, disk_block; }

int is_valid, is_dirty;

} PTE;

struct PTE pt[NUM_VIRTUAL_PAGES];

int translate(int vpn) {

if (pt[vpn].is_valid)

return pt[vpn].ppn;

}

PT

vpn

Disk(swap)

Page 14: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

14

High level operation

SEGFAULT

OK (fast)

OK (fast)

OK (but slow)

!

Virtual memory

Page table

Physical memory

HDD/SSD storage

Page 15: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

15

Address translation

Adapted from Operating System Concepts by Silberschatz, Galvin, and Gagne

Page 16: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

16

Address translation

00000000000000000111000000000101

Index Data Valid?

0 463 0

1 116 1

2 460 1

3 407 1

4 727 0

5 719 1

6 203 0

7 12 1

8 192 1

00000000000000001100000000000101

Virtual address:

Physical address:

Virtual page number Page offset

Physical page number Page offset

Page table:

Page 17: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

17

Virtual address space

• Enables sparse address spaces with holes left for growth, dynamically linked libraries, etc

• System libraries shared via mapping into virtual address space

• Shared memory by mapping pages read-write into virtual address space

• Pages can be shared during fork(), speeding process creation

Adapted from Operating System Concepts by Silberschatz, Galvin, and Gagne

Page 18: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

18

Structure of the page table

Page 19: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

19

Page Table Size

• How big is a page table on the following machine? • 4B page table entries (PTEs)

• 32-bit machine

• 4KB pages

• Solution • 32-bit machine → 32-bit VA → 4GB virtual memory

• 4GB virtual memory / 4KB page size → 1M VPs

• 1M VPs * 4B PTE → 4MB page table

• How big would the page table be with 64KB pages?

• How big would it be for a 64-bit machine?

• Page tables can get enormous • There are ways of making them smaller

Page 20: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

20

Multi-Level Page Table

• One way: multi-level page tables • Tree of page tables

• Lowest-level tables hold PTEs

• Upper-level tables hold pointers to lower-level tables

• Different parts of VPN used to index different levels

• Example: two-level page table for machine on last slide • Compute number of pages needed for lowest-level (PTEs)

• 4KB pages / 4B PTEs → 1K PTEs fit on a single page

• 1M PTEs / (1K PTEs/page) → 1K pages to hold PTEs

• Compute number of pages needed for upper-level (pointers)

• 1K lowest-level pages → 1K pointers

• 1K pointers * 32-bit VA → 4KB → 1 upper level page

Page 21: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

21

Multi-Level Page Table

• 20-bit VPN • Upper 10 bits index 1st-level table

• Lower 10 bits index 2nd-level table 1st-level

“pointers”

2nd-level

PTEs VPN[9:0] VPN[19:10]

struct {

union { int ppn, disk_block; }

int is_valid, is_dirty;

} PTE;

struct {

struct PTE ptes[1024];

} L2PT;

struct L2PT *pt[1024];

int translate(int vpn) {

struct L2PT *l2pt = pt[vpn>>10];

if (l2pt && l2pt->ptes[vpn&1023].is_valid)

return l2pt->ptes[vpn&1023].ppn;

}

pt “root”

Page 22: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

22

Multi-Level Page Table

• Have we saved any space? • Isn’t total size of 2nd level PTE pages same as single-

level table (i.e., 4MB)?

• Yes, but…

• Large virtual address regions unused • Corresponding 2nd-level pages need not exist

• Corresponding 1st-level pointers are null

• Example: 2MB code, 64KB stack, 16MB heap • Each 2nd-level page maps 4MB of virtual addresses

• 1 page for code, 1 for stack, 4 for heap, (+1 1st-level)

• 7 total pages for PT = 28KB (<< 4MB)

Page 23: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

23

Address Translation Mechanics

• The six questions • What? address translation

• Why? compatibility, multi-programming, protection

• How? page table

• Who performs it?

• When?

• Where does page table reside?

• Option I: process (program) translates its own addresses • Page table resides in process visible virtual address space

– Bad idea: implies that program (and programmer)…

• …must know about physical addresses

• Isn’t that what virtual memory is designed to avoid?

• …can forge physical addresses and mess with other programs

• Translation on L2 miss or always? How would program know?

Page 24: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

24

Who? Where? When? Take II

• Option II: operating system (OS) translates for process • Page table resides in OS virtual address space

+ User-level processes cannot view/modify their own tables

+ User-level processes need not know about physical addresses

• Translation on L2 miss

– Otherwise, OS SYSCALL before any fetch, load, or store

• L2 miss: interrupt transfers control to OS handler • Handler translates VA by accessing process’s page table

• Accesses memory using PA

• Returns to user process when L2 fill completes

– Still slow: added interrupt handler and PT lookup to memory access

– What if PT lookup itself requires memory access? Head spinning…

Page 25: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

25

The Translation Lookaside Buffer (TLB)

Page 26: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

26

Still a problem

• OS handling of every L2 miss is too slow!

• We need to accelerate the translation of VA to PA

Answer: translation buffer

A special cache for the page table

Page 27: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

27

Translation Buffer

• Functionality problem? Add indirection!

• Performance problem? Add cache!

• Address translation too slow? • Cache translations in translation buffer (TB)

• Small cache: 16–64 entries, often fully assoc

+ Exploits temporal locality in PT accesses

+ OS handler only on TB miss

CPU

D$

L2

Main

Memory

I$

TB

VPN PPN

VPN PPN

VPN PPN

“tag” “data” PA

VA

VA

VA VA

Page 28: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

28

TB Misses

• TB miss: requested PTE not in TB, but in PT • Two ways of handling

• 1) OS routine: reads PT, loads entry into TB (e.g., Alpha) • Privileged instructions in ISA for accessing TB directly

• Latency: one or two memory accesses + OS call

• 2) Hardware FSM: does same thing (e.g., Intel x86) • Store PT root pointer in hardware register

• Make PT root and 1st-level table pointers physical addresses

• So FSM doesn’t have to translate them

+ Latency: saves cost of OS call

Page 29: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

29

Page Faults

• Page fault: PTE not in TB or in PT • Page is simply not in memory

• Starts out as a TB miss, detected by OS handler/hardware FSM

• OS routine • OS software chooses a physical page to replace

• “Working set”: more refined software version of LRU

• Tries to see which pages are actively being used

• Balances needs of all current running applications

• If dirty, write to disk (like dirty cache block with writeback $)

• Read missing page from disk (done by OS)

• Takes so long (10ms), OS schedules another task

• Treat like a normal TB miss from here

Page 30: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

30

Demand Paging

• The swapper process is called the pager

Memory reference

Is in physical memory?

Success

Is page stored on disk?

Load it, success

Invalid reference, abort!

Y

N

N

Y

Adapted from Operating System Concepts by Silberschatz, Galvin, and Gagne

Hardware

Operating system

Page 31: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

31

Virtual Caches

• Memory hierarchy so far: virtual caches • Indexed and tagged by VAs

• Translate to PAs only to access memory

+ Fast: avoids translation latency in common case

• What to do on process switches? • Flush caches? Slow

• Add process IDs to cache tags?

• Does inter-process communication work? • Aliasing: multiple VAs map to same PA

• How are multiple cache copies kept in sync?

• Also a problem for I/O (later in course)

• Disallow caching of shared memory? Slow

CPU

D$

L2

Main

Memory

I$

TB

PA

VA

VA

VA VA

Page 32: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

32

Physical Caches

• Alternatively: physical caches • Indexed and tagged by PAs

• Translate to PA at the outset

+ No need to flush caches on process switches

• Processes do not share PAs

+ Cached inter-process communication works

• Single copy indexed by PA

– Slow: adds 1 cycle to thit

CPU

D$

L2

Main

Memory

I$

TB

PA

PA

VA VA

PA PA

TB

Page 33: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

33

Virtual Physical Caches

• Compromise: virtual-physical caches • Indexed by VAs

• Tagged by PAs

• Cache access and address translation in parallel

+ No context-switching/aliasing problems

+ Fast: no additional thit cycles

• A TB that acts in parallel with a cache is a TLB

• Translation Lookaside Buffer

• Common organization in processors today

CPU

D$

L2

Main

Memory

I$ TLB

PA

PA

VA VA

TLB

Page 34: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

34

Cache/TLB Access

• Two ways to look at VA • Cache: TAG+IDX+OFS

• TLB: VPN+POFS

• Can have parallel cache & TLB …

• If address translation doesn’t change IDX

• VPN/IDX don’t overlap

1:0 [31:12]

data

[11:2] <<

address

==

TLB hit/miss

0

1

1022

1023

2

==

==

==

VPN [31:16] POFS[15:0]

cache

TLB

cache hit/miss

Page 35: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

35

Cache Size And Page Size

• Relationship between page size and L1 I$(D$) size

• Forced by non-overlap between VPN and IDX portions of VA

• Which is required for TLB access

• I$(D$) size / associativity ≤ page size

• Big caches must be set associative

• Big cache more index bits (fewer tag bits)

• More set associative fewer index bits (more tag bits)

• Systems are moving towards bigger (64KB) pages

• To amortize disk latency

• To accommodate bigger caches

1:0 [31:12] IDX[11:2]

VPN [31:16] [15:0]

Page 36: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

36

Flavors of Virtual Memory

• Virtual memory almost ubiquitous today • Certainly in general-purpose (in a computer) processors

• But even some embedded (in non-computer) processors support it

• Several forms of virtual memory • Paging (aka flat memory): equal sized translation blocks

• Most systems do this

• Segmentation: variable sized (overlapping?) translation blocks

• IA32 uses this

• Makes life very difficult

• Paged segments: don’t ask

Page 37: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

37

Performance and the dangers of thrashing

Page 38: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

38

The Table of Time

38

Event Picoseconds ≈ Hardware/target Source

Average instruction time* 30 30 ps Intel Core i7 4770k (Haswell), 3.9GHz https://en.wikipedia.org/wiki/Instructions_per_second

Time for light to traverse CPU core (~13mm)

44 40 ps Intel Core i7 4770k (Haswell), 3.9GHz http://www.anandtech.com/show/7003/the-haswell-review-intel-core-i74770k-i54560k-

tested/5

Clock cycle (3.9GHz) 256 300 ps Intel Core i7 4770k (Haswell), 3.9GHz Math

Memory read: L1 hit 1,212 1 ns Intel i3-2120 (Sandy Bridge), 3.3 GHz http://www.7-cpu.com/cpu/SandyBridge.html

Memory read: L2 hit 3,636 4 ns Intel i3-2120 (Sandy Bridge), 3.3 GHz http://www.7-cpu.com/cpu/SandyBridge.html

Memory read: L3 hit 8,439 8 ns Intel i3-2120 (Sandy Bridge), 3.3 GHz http://www.7-cpu.com/cpu/SandyBridge.html

Memory read: DRAM 64,485 60 ns Intel i3-2120 (Sandy Bridge), 3.3 GHz http://www.7-cpu.com/cpu/SandyBridge.html

Process context switch or system call 3,000,000 3 us Intel E5-2620 (Sandy Bridge), 2GHz http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

Storage sequential read**, 4kB (SSD) 7,233,796 7 us SSD: Samsung 840 500GB http://www.samsung.com/global/business/semic

onductor/minisite/SSD/global/html/whitepaper/whitepaper01.html

Storage sequential read**, 4kB (HDD) 65,104,167 70 us HDD: 2.5" 500GB 7200RPM http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/whitepaper/whitepaper01.html

Storage random read, 4kB (SSD) 100,000,000 100 us SSD: Samsung 840 500GB http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/whitepaper/whitepaper01.html

Storage random read, 4kB (HDD) 10,000,000,000 10 ms HDD: 2.5" 500GB 7200RPM http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/whitepaper/whitepaper01.html

Internet latency, Raleigh home to NCSU (3 mi)

21,000,000,000 20 ms courses.ncsu.edu Ping

Internet latency, Raleigh home to Chicago ISP (639 mi)

48,000,000,000 50 ms dls.net Ping

Internet latency, Raleigh home to Luxembourg ISP (4182 mi)

108,000,000,000 100 ms eurodns.com Ping

Time for light to travel to the moon (average)

1,348,333,333,333 1 s The moon http://www.wolframalpha.com/input/?i=distance+to+the+moon

* Based on Dhrystone, single core only, average time per instruction

** Based on sequential throughput, average time per block

Page 39: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

40

Waiting on disk is bad

• If we frequently have to go to disk, the memory latency becomes like disk latency (1000x worse!)

• Need the vast majority of memory accesses to be in memory.

• Remember when our working set didn’t fit in cache?

• It’s like that, but now our working set isn’t fitting into RAM!

• When this happens, it’s called thrashing.

• Makes CPU appear under-utilized, as most time is spent waiting on a disk.

Page 40: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

41

Thrashing

• If a process does not have “enough” pages, the page-fault rate is very high

• Page fault to get page

• Replace existing frame

• But quickly need replaced frame back

• Thrashing a process is busy swapping pages in and out

Adapted from Operating System Concepts by Silberschatz, Galvin, and Gagne

Page 41: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

42

Error correction in DRAM

Page 42: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

43

Error Detection and Correction

• One last thing about DRAM technology: errors • DRAM fails at a higher rate than SRAM or CPU logic

• Capacitor wear

• Bit flips from energetic α-particle strikes

• Many more bits

• Modern DRAM systems: built-in error detection/correction

• Key idea: checksum-style redundancy • Main DRAM chips store data, additional chips store f(data)

• |f(data)| < |data|

• On read: re-compute f(data), compare with stored f(data)

• Different ? Error…

• Option I (detect): kill program

• Option II (correct): enough information to fix error? fix and go on

Page 43: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

44

Error Detection and Correction

• Error detection/correction schemes distinguished by… • How many (simultaneous) errors they can detect

• How many (simultaneous) errors they can correct

4M

x

2B

4M

x

2B

4M

x

2B

4M

x

2B

0 1 2 3

4M

x

2B

f

f

==

error data address

Page 44: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

45

Error Detection Example: Parity

• Parity: simplest scheme • f(dataN–1…0) = XOR(dataN–1, …, data1, data0)

+ Single-error detect: detects a single bit flip (common case)

• Will miss two simultaneous bit flips…

• But what are the odds of that happening?

– Zero-error correct: no way to tell which bit flipped

– Many other schemes exist for detecting/correcting errors – Take ECE 554 (Fault Tolerant Computing) for more info

Page 45: ECE 550D Fundamentals of Computer Systems and …people.duke.edu/~tkb13/courses/ece550-2016fa/slides/10-virtual... · ECE 550D Fundamentals of Computer Systems and Engineering

46

Summary

• Virtual memory • Page tables and address translation

• Page faults and handling

• Virtual, physical, and virtual-physical caches and TLBs

• Error correction

Next part of course: I/O