os, base & bounds, virtual memory introcs61c/resources/su18_lec/...•solution: virtual memory....
TRANSCRIPT
OS, Base & Bounds, Virtual Memory IntroInstructor: Steven Ho
Review
• Warehouse Scale Computers– Supports many of the applications we have come to
depend on– Software must cope with failure, load variation, and
latency/bandwidth limitations– Hardware sensitive to cost and energy efficiency
• Request Level Parallelism– High request volume, each largely independent– Replication for better throughput, availability
• MapReduce– Convenient data-level parallelism on large dataset across
large number of machines– Spark is a framework for executing MapReduce algorithms
7/30/2018 CS61C Su18 - Lecture 22 2
Word Count in Spark’s Python API// RDD: primary abstraction of a distributed collection of items
file = sc.textFile(“hdfs://…”)
// Two kinds of operations:
// Actions: RDD → Value
// Transformations: RDD → RDD
// e.g. flatMap, Map, reduceByKey
file.flatMap(lambda line: line.split())
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
37/30/2018 CS61C Su18 - Lecture 22
MapReduce Word Count Example• Map phase: (doc name, doc contents) → list(word, count)
// “I do I learn”” → [(“I”,1),(“do”,1),(“I”,1),(“learn”,1)]
map(key, value):
for each word w in value:
emit(w, 1)
• Reduce phase: (word, list(count)) → (word, count_sum)// (“I”, [1,1]) → (“I”,2)
reduce(key, values):
result = 0
for each v in values:
result += v
emit(key, result)
47/30/2018 CS61C Su18 - Lecture 22
Word Count in Spark’s Python API// RDD: primary abstraction of a distributed collection of items
file = sc.textFile(“hdfs://…”)
// Two kinds of operations:
// Actions: RDD → Value
// Transformations: RDD → RDD
// e.g. flatMap, Map, reduceByKey
file.flatMap(lambda line: line.split())
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
57/30/2018 CS61C Su18 - Lecture 22
MapReduce Word Count Example• Map phase: (doc name, doc contents) → list(word, count)
// “I do I learn”” → [(“I”,1),(“do”,1),(“I”,1),(“learn”,1)]
map(key, value):
for each word w in value:
emit(w, 1)
• Reduce phase: (word, list(count)) → (word, count_sum)// (“I”, [1,1]) → (“I”,2)
reduce(key, values):
result = 0
for each v in values:
result += v
emit(key, result)
67/30/2018 CS61C Su18 - Lecture 22
Word Count in Spark’s Python API// RDD: primary abstraction of a distributed collection of items
file = sc.textFile(“hdfs://…”)
// Two kinds of operations:
// Actions: RDD → Value
// Transformations: RDD → RDD
// e.g. flatMap, Map, reduceByKey
file.flatMap(lambda line: line.split())
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
77/30/2018 CS61C Su18 - Lecture 22
MapReduce Word Count Example• Map phase: (doc name, doc contents) → list(word, count)
// “I do I learn”” → [(“I”,1),(“do”,1),(“I”,1),(“learn”,1)]
map(key, value):
for each word w in value:
emit(w, 1)
• Reduce phase: (word, list(count)) → (word, count_sum)// (“I”, [1,1]) → (“I”,2)
reduce(key, values):
result = 0
for each v in values:
result += v
emit(key, result)
87/30/2018 CS61C Su18 - Lecture 22
a b
Agenda
• OS Intro
• Administrivia
• OS Boot Sequence and Operation
• Multiprogramming/time-sharing
• Introduction to Virtual Memory
97/30/2018 CS61C Su18 - Lecture 22
CS61C so far…
10
CPU
Caches
Memory
RISC-V Assembly
C Programs#include <stdlib.h>
int fib(int n) { return fib(n-1) + fib(n-2);}
.foolw t0, 4(s0)addi t1, t0, 3beq t1, t2, foonop
Project 1Project 2Labs
Project 3
7/30/2018 CS61C Su18 - Lecture 22
So how is this any different?
11
Keyboard
Screen
Storage
7/30/2018 CS61C Su18 - Lecture 22
Adding I/O
12
I/O (Input/Output)
Screen Keyboard Storage
CPU
Caches
Memory
RISC-V Assembly
C Programs#include <stdlib.h>
int fib(int n) { return fib(n-1) + fib(n-2);}
Project 1Project 2
Project 3
7/30/2018 CS61C Su18 - Lecture 22
.foolw t0, 4(s0)addi t1, t0, 3beq t1, t2, foonop
Raspberry Pi ($40 on Amazon)
13
CPU+$s, etc.
MemoryStorage I/O(Micro SD Card)
Serial I/O(USB)
Network I/O(Ethernet)Screen I/O
(HDMI)
7/30/2018 CS61C Su18 - Lecture 22
It’s a real computer!
147/30/2018 CS61C Su18 - Lecture 22
But wait…• That’s not the same! When we run Venus, it
only executes one program and then stops.
• When I switch on my computer, I get this:
15
Yes, but that’s just software! The Operating System (OS)7/30/2018 CS61C Su18 - Lecture 22
Well, “just software”
• The biggest piece of software on your machine?
• How many lines of code? These are guesstimates:
16
Codebases (in millions of lines of code). CC BY-NC 3.0 — David McCandless © 2015http://www.informationisbeautiful.net/visualizations/million-lines-of-code/
84 million lines of code!
7/30/2018 CS61C Su18 - Lecture 22
What does the OS do?• One of the first things that runs when your
computer starts (right after firmware/bootloader)
• Loads, runs and manages programs:– Multiple programs at the same time (time-sharing)
– Isolate programs from each other (isolation)
– Multiplex resources between applications (e.g., devices)
• Services: File System, Network stack, etc.
• Finds and controls all the devices in the machine in a general way (using “device drivers”)
177/30/2018 CS61C Su18 - Lecture 22
Agenda
• OS Intro
• Administrivia
• OS Boot Sequence and Operation
• Multiprogramming/time-sharing
• Introduction to Virtual Memory
187/30/2018 CS61C Su18 - Lecture 22
Administrivia
• Proj4 due on Friday (8/03)– Project Party tonight, Soda 405/411, 4-6pm
• HW6 due tonight, HW7 released• Guerilla Session on Wed. @Cory 540AB• The final will be 8/09 7-10PM @VLSB
2040/2060!• We’re almost there! :D
197/30/2018 CS61C Su18 - Lecture 22
Agenda
• OS Intro
• Administrivia
• OS Boot Sequence and Operation
• Multiprogramming/time-sharing
• Introduction to Virtual Memory
207/30/2018 CS61C Su18 - Lecture 22
What happens at boot?• When the computer switches on, it does the
same as Venus: the CPU executes instructions from some start address (stored in Flash ROM)
21
CPU
PC = 0x2000 (some default value) Address Space
0x2000:addi t0, x0, 0x1000lw t0, 4(s0)…
(Code to copy firmware into regular memory and jump into it)
Memory mapped
7/30/2018 CS61C Su18 - Lecture 22
What happens at boot?• When the computer switches on, it does the
same as Venus: the CPU executes instructions from some start address (stored in Flash ROM)
22
1. BIOS: Find a storagedevice and load first sector (block of data)
2. Bootloader (stored on, e.g., disk): Load the OS kernel from disk into a location in memory and jump into it.
3. OS Boot: Initialize services, drivers, etc.
4. Init: Launch an application that waits for input in loop (e.g., Terminal/Desktop/...
7/30/2018 CS61C Su18 - Lecture 22
Launching Applications• Applications are called “processes” in most OSs.• Created by another process calling into an OS
routine (using a “syscall”, more details later).– Depends on OS, but Linux uses fork (see OpenMP
threads) to create a new process, and execve to load application.
• Loads executable file from disk (using the file system service) and puts instructions & data into memory (.text, .data sections), prepare stack and heap.
• Set argc and argv, jump into the main function.
237/30/2018 CS61C Su18 - Lecture 22
Supervisor Mode• If something goes wrong in an application, it can
crash the entire machine. What about malware, etc.?
• The OS may need to enforce resource constraints to applications (e.g., access to devices).
• To protect the OS from the application, CPUs have a supervisor mode bit (also need isolation, more later).– You can only access a subset of instructions and
(physical) memory when not in supervisor mode (user mode).
– You can change out of supervisor mode using a special instruction, but not into it (unless there is an interrupt).
247/30/2018 CS61C Su18 - Lecture 22
Syscalls• How to switch back to OS? OS sets timer
interrupt, when interrupts trigger, drop into supervisor mode.
• What if we want to call into an OS routine? (e.g., to read a file, launch a new process, send data, etc.)– Need to perform a syscall: set up function arguments
in registers, and then raise software interrupt– OS will perform the operation and return to user
mode
• This way, the OS can mediate access to all resources, including devices, the CPU itself, etc.
257/30/2018 CS61C Su18 - Lecture 22
Syscalls in Venus• Venus provides many simple syscalls using the ecall RISC-V instruction
• How to issue a syscall?– Place the syscall number in a0– Place arguments to the syscall in the a1 register– Issue the ecall instruction
• This is how your RISC-V code has been able to produce output all along
• ecall details depend on the ABI (Application Binary Interface)
267/30/2018 CS61C Su18 - Lecture 22
Example Syscall
• Let’s say we want to print an integer stored in s3:
Print integer is syscall #1
li a0, 1add a1, s3, x0ecall
277/30/2018 CS61C Su18 - Lecture 22
Venus’s Environmental Calls
7/30/2018 CS61C Su18 - Lecture 22 28
Agenda
• OS Intro
• Administrivia
• OS Boot Sequence and Operation
• Multiprogramming/time-sharing
• Introduction to Virtual Memory
297/30/2018 CS61C Su18 - Lecture 22
Multiprogramming• OS runs multiple applications at the same time.
• But not really (unless have a core per process)
• Switches between processes very quickly. This is called a “context switch”.
• When jumping into process, set timer interrupt.– When it expires, store PC, registers, etc. (process
state).
– Pick a different process to run and load its state.
– Set timer, change to user mode, jump to the new PC.
• Deciding what process to run is called scheduling.
307/30/2018 CS61C Su18 - Lecture 22
Protection, Translation, Paging• Supervisor mode does not fully isolate
applications from each other or from the OS.– Application could overwrite another application’s
memory.– Remember the linker in CALL: application assumes
that code is in certain location. How to prevent overlaps?
– May want to address more memory than we actually have (e.g., for sparse data structures).
• Solution: Virtual Memory. Give each process the illusion of a full memory address space that it has completely to itself.
317/30/2018 CS61C Su18 - Lecture 22
Meet the
Staff
6/21/2018 CS61C Su18 - Lecture 4 32
Emaan Sruthi SeanTrash TV Show
Judge JudyBachelor in
ParadiseDr. Phil
Favorite Plot Twist
Breaking Bad Arrival Memento
Best Bathroom in Cal
7th Floor Soda Stanley HallBottom Floor of
Cory
Best Study Spot
1st Floor MoffittThe VLSB
LibraryUpper Floor of
East Asian
Agenda
• OS Intro
• Administrivia
• OS Boot Sequence and Operation
• Multiprogramming/time-sharing
• Introduction to Virtual Memory
337/30/2018 CS61C Su18 - Lecture 22
Adding Disks to Hierarchy• Use VM as a mechanism to “connect” memory
and disk in the memory hierarchy
7/30/2018 CS61C Su18 - Lecture 22 34
Regs
L2 Cache
Memory
Disk
Tape
Instr Operands
Blocks
Pages
Files
Upper Level
Lower Level
Faster
Larger
L1 CacheBlocks
Memory Hierarchy
7/30/2018 CS61C Su18 - Lecture 22 35
Now:Virtual
Memory
Earlier:Caches
Virtual Memory Goals
• Allow multiple processes to simultaneously occupy memory and provide protection: Don’t let programs read/write each other’s memory
• Give each program the illusion that it has its own private address space– Suppose code starts at address 0x00400000, then
different processes each think their code resides at that same address!
– Each program must have a different view of memory
7/30/2018 CS61C Su18 - Lecture 22 36
Virtual Memory Goals
• Next level in the memory hierarchy:– Provides program with illusion of a very large main
memory:– Working set of “pages” reside in main memory - others
reside on disk.
• Also allows OS to share memory, protect programs from each other
• Today, more important for protection vs. just another level of memory hierarchy
• Each process thinks it has all the memory to itself• (Historically, it predates caches)
7/30/2018 CS61C Su18 - Lecture 22 37
Virtual to Physical Address Translation
• Each program operates in its own virtual address space; ~only program running
• Each is protected from the other• OS can decide where each goes in memory• Hardware gives virtual physical mapping
38
Dynamic Address Translation
7/30/2018 CS61C Su18 - Lecture 22 39
Location-independent programsProgramming and storage management ease ⇒ need for a base register
ProtectionIndependent programs should not affecteach other inadvertently ⇒ need for a bound register
Multiprogramming drives requirement for resident supervisor (OS) software to manage context switches between multiple programs
prog1
prog2
Phys
ical
Mem
ory
OS
Modern Virtual Memory Systems Illusion of a large, private, uniform store
7/30/2018 CS61C Su18 - Lecture 22 40
Protectionseveral programs, each with their private address space and one or more shared address spaces
Demand PagingProvides the ability to run programs larger than the primary memory
Hides differences in machine configurations
The price is address translation on each memory reference
OS
progi
PrimaryMemory
SwappingStore
VA PAmappingTLB
Analogy: Shopping at IKEA• Furniture name like virtual address• Actual warehouse location like physical
address• IKEA shopping list like page table—you write
mappings from furniture name to location• On shopping list, whether the item is sold out
like valid bit indicating in main memory vs. on disk
• Whether you can flop on a bed, draw on chalk board, turn lights off/onlike access rights
41
Simple Example: Base and Bound Reg
42
Simple Base and Bound Translation
7/30/2018 CS61C Su18 - Lecture 22 43
Load X
ProgramAddress Space
BoundRegister
BoundsViolation?
Phys
ical
Mem
ory
currentsegment
BaseRegister
+
PhysicalAddressLogical
Address
Base and bounds registers are visible/accessible only when processor is running in supervisor mode
Base Physical Address
Segment Length
≤
Separate Areas for Program and Data
7/30/2018 CS61C Su18 - Lecture 22 44
Physical Address
Physical Address
Load X
ProgramAddressSpace
Mai
n M
emo
ry
datasegment
Data Bound Register
Mem. Address Register
Data Base Register
≤
+
BoundsViolation?
Program Bound Register
Program Counter
Program Base Register
≤
+
BoundsViolation?
programsegment
Logical Address
Logical Address
What is an advantage of this separation?
(Scheme used on all Cray vector supercomputers prior to X1, 2002)
“Bare” 5-Stage Pipeline
• In a bare machine, the only kind of address is a physical address
7/30/2018 CS61C Su18 - Lecture 22 45
PC
Inst. Cache D Decode E M
Data Cache W+
Main Memory (DRAM)
Memory Controller
Physical Address
Physical Address
Physical Address
Physical Address
Physical Address
Base and Bound Machine
7/30/2018 CS61C Su18 - Lecture 22 46
PC
Inst. Cache D Decode E M
Data Cache W+
Main Memory (DRAM)
Memory Controller
Physical Address
Physical Address
Physical Address
Physical Address
Data Bound Register
Data Base Register
≤
+
Logical Address
Bounds Violation?
Physical Address
Prog. Bound Register
Program Base Register
≤
+
Logical Address
Bounds Violation?
As programs start and end, memory becomes fragmented. Therefore, if we ever need a large chunk of memory, programs will need to be moved.
Memory Fragmentation
7/30/2018 CS61C Su18 - Lecture 22 47
What if we want to run process 6, and we need 32K of space?!
OSSpace
16K24K
24K
32K
24K
prog 1prog 2
prog 3
OSSpace
24K16K
32K
24K
prog 1prog 2
prog 3
prog 5
prog 48K
Programs 4 & 5 start
Programs 2 & 5end OS
Space
16K24K16K
32K
24K
prog 1
prog 48K
prog 3
free
16KBIG IDEA:USE RAM AS $$
FOR DISK
Direct Mapped, write back/write allocate(A)
Direct Mapped, write through/no write allocate
(B)
Fully Associative, write back/write allocate(C)
Fully Associative, write thorugh/no write allocate
(D)
48
Question: What kind of cache should memory be?
Direct Mapped, write back/write allocate(A)
Direct Mapped, write through/no write allocate
(B)
Fully Associative, write back/write allocate(C)
Fully Associative, write through/no write allocate
(D)
49
Question: What kind of cache should memory be?
Summary
• The role of the Operating System– Booting a computer: BIOS, bootloader, OS boot,
initialization
• Base and bounds for multiple processes– Simple, but doesn’t give us everything we want
• Virtual memory bridges memory and disk– Provides illusion of independent address spaces to
processes and protects them from each other
7/30/2018 CS61C Su18 - Lecture 22 50