ucdavis, ecs150 spring 2006 05/31/2006ecs150, spring 20061 operating system ecs150 spring 2006 :...

71
05/31/2006 ecs150, spring 2006 1 UCDavis, ecs150 Spring 2006 ecs150 Spring 2006: Operating System Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu Computer Science Department University of California, Davis http://www.cs.ucdavis.edu/~wu/ [email protected]

Post on 21-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 1

UCDavis, ecs150Spring 2006

ecs150 Spring 2006:Operating SystemOperating System#7: mbuf(Chapter 11)

Dr. S. Felix Wu

Computer Science Department

University of California, Davishttp://www.cs.ucdavis.edu/~wu/

[email protected]

Page 2: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 2

UCDavis, ecs150Spring 2006

IPCIPC

Uniform communication for distributed processes– “socket”: network programming– operating system kernel issues

Semaphores, messages queues, and shared memory for local processes

Page 3: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 3

UCDavis, ecs150Spring 2006 SocketSocket

an IPC Abstraction Layeran IPC Abstraction Layer

Page 4: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 4

UCDavis, ecs150Spring 2006

MbufsMbufsMemory BuffersMemory Buffers

The main data structure for network processing in the kernel

Why can’t we use “kernel memory management” facilities such as kernel malloc (power of 2 alike), page, or VM objects directly?

Page 5: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 5

UCDavis, ecs150Spring 2006

““Packet”Packet”

EtherNet or 802.11 header IP header

– IPsec header Transport headers (TCP/UDP/…)

– SSL header Others???

Page 6: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 6

UCDavis, ecs150Spring 2006

PropertiesPropertiesNetwork Packet ProcessingNetwork Packet Processing

Variable sizes Prepend or remove Fragment/divide or defragment/combine can we avoid COPYING as much

as possible??? Queue Parallel processing for high speed

– E.g., Juniper routers are running FreeBSD

Page 7: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 7

UCDavis, ecs150Spring 2006

sys/mbuf.hkern/kern_mbuf.ckern/ipc_mbuf.ckern/ipc_mbuf2.c

256bytes

24

4

Page 8: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 8

UCDavis, ecs150Spring 2006

M_EXTM_PKTHDRM_EORM_BCASTM_MCAST

the same packetnext packet

Page 9: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 9

UCDavis, ecs150Spring 2006

#define M_EXT 0x0001#define M_PKTHDR 0x0002 #define M_EOR 0x0004#define M_RDONLY 0x0008 #define M_PROTO1 0x0010 #define M_PROTO2 0x0020 #define M_PROTO3 0x0040 #define M_PROTO4 0x0080 #define M_PROTO5 0x0100 #define M_SKIP_FIREWALL 0x4000 #define M_FREELIST 0x8000 #define M_BCAST 0x0200 #define M_MCAST 0x0400 #define M_FRAG 0x0800 #define M_FIRSTFRAG 0x1000 #define M_LASTFRAG 0x2000

Page 10: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 10

UCDavis, ecs150Spring 2006

struct mbuf { struct m_hdr m_hdr; union { struct { struct pkthdr MH_pkthdr; union { struct m_ext MH_ext; char MH_databuf[MHLEN]; } MH_dat; } MH; char M_databuf[MLEN]; } M_dat;};

Page 11: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 11

UCDavis, ecs150Spring 2006

struct mbuf { struct m_hdr m_hdr; union { struct { struct pkthdr MH_pkthdr; union { struct m_ext MH_ext; char MH_databuf[MHLEN]; } MH_dat; } MH; char M_databuf[MLEN]; } M_dat;};

Page 12: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 12

UCDavis, ecs150Spring 2006

struct mbuf { struct m_hdr m_hdr; union { struct { struct pkthdr MH_pkthdr; union { struct m_ext MH_ext; char MH_databuf[MHLEN]; } MH_dat; } MH; char M_databuf[MLEN]; } M_dat;};

Page 13: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 13

UCDavis, ecs150Spring 2006

24 bytes

Page 14: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 14

UCDavis, ecs150Spring 2006

IPsec_IN_DONEIPsec_OUT_DONEIPsec_IN_CRYPTO_DONEIPsec_OUT_CRYPTO_DONE

Page 15: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 15

UCDavis, ecs150Spring 2006

mbufmbuf

Current: 256 Old: 128 (shown in the following slides)

Page 16: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 16

UCDavis, ecs150Spring 2006

Page 17: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 17

UCDavis, ecs150Spring 2006

A Typical UDP Packet

Page 18: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 18

UCDavis, ecs150Spring 2006

Page 19: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 19

UCDavis, ecs150Spring 2006

m_devget: When an IP packet comes in…

Page 20: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 20

UCDavis, ecs150Spring 2006

Page 21: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 21

UCDavis, ecs150Spring 2006

Page 22: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 22

UCDavis, ecs150Spring 2006

mtod & dtommtod & dtom

mbuf ptr data region – e.g. struct ip

mtod?

dtom?

Page 23: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 23

UCDavis, ecs150Spring 2006

mtod & dtommtod & dtom

mbuf ptr data region – e.g. struct ip

mtod? dtom?

#define dtom(x) (struct mbuf *) ((int) (x) & (MSIZE -1)))

Page 24: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 24

UCDavis, ecs150Spring 2006

mtod & dtommtod & dtom

mbuf ptr data region – e.g. struct ip

mtod? dtom?

#define dtom(x) (struct mbuf *)((int *)(x)&~(MSIZE -1)))

Page 25: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 25

UCDavis, ecs150Spring 2006

netstat -mnetstat -m

Check for mbuf statistics

Page 26: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 26

UCDavis, ecs150Spring 2006

mbufmbuf

IP input/output/forward IPsec IP fragmentation/defragmentation Device IP Socket

Page 27: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 27

UCDavis, ecs150Spring 2006

Memory Management for IPCMemory Management for IPC

Why do we need something like MBUF?

Page 28: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 28

UCDavis, ecs150Spring 2006

Page 29: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 29

UCDavis, ecs150Spring 2006

Page 30: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 30

UCDavis, ecs150Spring 2006

Page 31: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 31

UCDavis, ecs150Spring 2006

I/O ArchitectureI/O Architecture

CPU MemoryDeviceController

I/ODevice

Control bus

Data and I/O buses

Internalbuffer

InitializationInputOutputConfigurationInterrupt

IRQ

Page 32: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 32

UCDavis, ecs150Spring 2006

Direct Memory AccessDirect Memory Access

Used to avoid programmed I/O for large data movement

Requires DMA controller Bypasses CPU to transfer data directly

between I/O device and memory

Page 33: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 33

UCDavis, ecs150Spring 2006

DMA RequestsDMA Requests Disk address to start copying Destination memory address Number of bytes to copy

Page 34: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 34

UCDavis, ecs150Spring 2006

Page 35: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 35

UCDavis, ecs150Spring 2006

Is DMA a good idea?Is DMA a good idea? CPU is a lot faster Controllers/Devices have larger internal

buffer DMA might be much slower than CPU Controllers become more and more

intelligent

USB doesn’t have DMA.

Page 36: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 36

UCDavis, ecs150Spring 2006 Network ProcessorNetwork Processor

Page 37: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 37

UCDavis, ecs150Spring 2006

File System MountingFile System Mounting

A file system must be mounted before it can be accessed.

A unmounted file system is mounted at a mount point.

Page 38: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 38

UCDavis, ecs150Spring 2006

Page 39: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 39

UCDavis, ecs150Spring 2006 Mount PointMount Point

Page 40: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 40

UCDavis, ecs150Spring 2006

logical diskslogical disks/

usr sys dev etc bin

/

local adm home lib bin

fs0: /dev/hd0a

fs1: /dev/hd0e

mount -t ufs /dev/hd0e /usr

mount -t nfs 152.1.23.12:/export/cdrom /mnt/cdrom

Page 41: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 41

UCDavis, ecs150Spring 2006 Distributed FSDistributed FS

Distributed File System– NFS (Network File System)– AFS (Andrew File System)– CODA

Page 42: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 42

UCDavis, ecs150Spring 2006 Distributed FSDistributed FS

/

usr sys dev etc bin

/

local adm home lib bin

ftp.cs.ucdavis.edu fs0: /dev/hd0a

Server.yahoo.com fs0: /dev/hd0e

Page 43: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 43

UCDavis, ecs150Spring 2006

Distributed File SystemDistributed File System

Transparency and Location Independence Reliability and Crash Recovery Scalability and Efficiency Correctness and Consistency Security and Safety

Page 44: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 44

UCDavis, ecs150Spring 2006 CorrectnessCorrectness

One-copy Unix Semantics??

Page 45: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 45

UCDavis, ecs150Spring 2006 CorrectnessCorrectness

One-copy Unix Semantics– every modification to every byte of a file has to

be immediately and permanently visible to every client.

Page 46: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 46

UCDavis, ecs150Spring 2006 CorrectnessCorrectness

One-copy Unix Semantics– every modification to every byte of a file has to

be immediately and permanently visible to every client.

– Conceptually FS sequent access Make sense in a local file system Single processor versus shared memory

Is this necessary?

Page 47: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 47

UCDavis, ecs150Spring 2006 DFS ArchitectureDFS Architecture

Server– storage for the distributed/shared files.– provides an access interface for the clients.

Client– consumer of the files.– runs applications in a distributed environment.

open closeread writeopendir statreaddir

applications

Page 48: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 48

UCDavis, ecs150Spring 2006 NFS (SUN, 1985)NFS (SUN, 1985)

Based on RPC (Remote Procedure Call) and XDR (Extended Data Representation)

Server maintains no state– a READ on the server opens, seeks, reads, and closes– a WRITE is similar, but the buffer is flushed to disk

before closing Server crash: client continues to try until server

reboots – no loss Client crashes: client must rebuild its own state –

no effect on server

Page 49: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 49

UCDavis, ecs150Spring 2006

RPC - XDRRPC - XDR

RPC: Standard protocol for calling procedures in another machine

Procedure is packaged with authorization and admin info

XDR: standard format for data, because manufacturers of computers cannot agree on byte ordering.

Page 50: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 50

UCDavis, ecs150Spring 2006

rpcgenrpcgen

RPC program

rpcgen

RPC client.c RPC server.cRPC.h

datastructure

datastructure

Page 51: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 51

UCDavis, ecs150Spring 2006

NFS OperationsNFS Operations

Every operation is independent: server opens file for every operation

File identified by handle -- no state information retained by server

client maintains mount table, v-node, offset in file table etc.

What do these imply???

Page 52: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 52

UCDavis, ecs150Spring 2006

Client computer Server computer

UNIXfile

system

NFSclient

NFSserver

UNIXfile

system

Applicationprogram

Applicationprogram

Virtual file systemVirtual file system

Oth

er f

ile s

yste

mUNIX kernel

system calls

NFSprotocol

(remote operations)

UNIX

Operations on local files

Operationson

remote files

*

Applicationprogram

NFSClient

KernelApplicationprogram

NFSClient

Client computer

mount –t nfs home.yahoo.com:/pub/linux /mnt/linux

Page 53: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 53

UCDavis, ecs150Spring 2006

Final – 06/15/2006 8~10 amFinal – 06/15/2006 8~10 am 1062 Bainer Midterm plus

– 5.1~5.8, 5.11~5.12– 6.1, 6.5~6.7– 8.1~8.9– 9.1~9.3– 11.3

Notes/PPT, Homeworks, Brainstorming

Page 54: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 54

UCDavis, ecs150Spring 2006 State-ful vs. State-lessState-ful vs. State-less

A server is fully aware of its clients– does the client have the newest copy?

– what is the offset of an opened file?

– “a session” between a client and a server!

A server is completely unaware of its clients– memory-less: I do not remember you!!

– Just tell me what you want to get (and where).

– I am not responsible for your offset values (the client needs to maintain the state).

Page 55: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 55

UCDavis, ecs150Spring 2006 The StateThe State

applications

openreadstatlseek

applications

openreadstatlseek

offset

Page 56: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 56

UCDavis, ecs150Spring 2006

Unix file semanticsUnix file semantics

NFS:– open a file with read-write mode– later, the server’s copy becomes read-only

mode– now, the application tries to write it!!

Page 57: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 57

UCDavis, ecs150Spring 2006

Problems with NFSProblems with NFS

Performance not scaleable:– maybe it is OK for a local office.– will be horrible with large scale systems.

Page 58: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 58

UCDavis, ecs150Spring 2006

Similar to UNIX file caching for local files:– pages (blocks) from disk are held in a main memory buffer cache

until the space is required for newer pages. Read-ahead and delayed-write optimisations.

– For local files, writes are deferred to next sync event (30 second intervals)

– Works well in local context, where files are always accessed through the local cache, but in the remote case it doesn't offer necessary synchronization guarantees to clients.

NFS v3 servers offers two strategies for updating the disk:– write-through - altered pages are written to disk as soon as they are

received at the server. When a write() RPC returns, the NFS client knows that the page is on the disk.

– delayed commit - pages are held only in the cache until a commit() call is received for the relevant file. This is the default mode used by NFS v3 clients. A commit() is issued by the client whenever a file is closed.

*

Page 59: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 59

UCDavis, ecs150Spring 2006 Server caching does nothing to reduce RPC traffic between client and

server– further optimisation is essential to reduce server load in large networks– NFS client module caches the results of read, write, getattr, lookup and

readdir operations– synchronization of file contents (one-copy semantics) is not guaranteed

when two or more clients are sharing the same file. Timestamp-based validity check

– reduces inconsistency, but doesn't eliminate it– validity condition for cache entries at the client:

(T - Tc < t) v (Tmclient = Tmserver)– t is configurable (per file) but is typically set to

3 seconds for files and 30 secs. for directories– it remains difficult to write distributed

applications that share files with NFS

*

t freshness guaranteeTc time when cache entry was

last validatedTm time when block was last

updated at serverT current time

Page 60: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 60

UCDavis, ecs150Spring 2006 AFSAFS

State-ful clients and servers. Caching the files to clients.

– File close ==> check-in the changes. How to maintain consistency?

– Using “Callback” in v2/3 (Valid or Cancelled)

openread

applications

invalidate and re-cache

Page 61: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 61

UCDavis, ecs150Spring 2006 Why AFS?Why AFS?

Shared files are infrequently updated Local cache of a few hundred mega bytes

– Now 50~100 giga bytes Unix workload:

– Files are small, Read Operations dominated, sequential access is common, read/written by one user, reference bursts.

– Are these still true?

Page 62: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 64

UCDavis, ecs150Spring 2006 Fault Tolerance in AFSFault Tolerance in AFS

a server crashes

a client crashes– check for call-back tokens first.

Page 63: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 65

UCDavis, ecs150Spring 2006

Problems with AFSProblems with AFS

Availability what happens if call-back itself is lost??

Page 64: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 66

UCDavis, ecs150Spring 2006

GFS – Google File SystemGFS – Google File System

“failures” are norm Multiple-GB files are common Append rather than overwrite

– Random writes are rare Can we relax the consistency?

Page 65: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 67

UCDavis, ecs150Spring 2006

Page 66: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 68

UCDavis, ecs150Spring 2006

Page 67: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 69

UCDavis, ecs150Spring 2006

CODACODA

Server Replication:– if one server goes down, I can get another.

Disconnected Operation:– if all go down, I will use my own cache.

Page 68: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 70

UCDavis, ecs150Spring 2006

ConsistencyConsistency

If John update file X on server A and Mary read file X on server B….

Read-one & Write-all

Page 69: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 71

UCDavis, ecs150Spring 2006 Read x & Write (N-x+1)Read x & Write (N-x+1)

read

write

Page 70: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 72

UCDavis, ecs150Spring 2006 Example: R3W4 (6+1)Example: R3W4 (6+1)

Initial 0 0 0 0 0 0Alice-W 2 2 0 2 2 0Bob-W 2 3 3 3 3 0Alice-R 2 3 3 3 3 0Chris-W 2 1 1 1 1 0Dan-R 2 1 1 1 1 0Emily-W 7 7 1 1 1 7Frank-R 7 7 1 1 1 7

Page 71: UCDavis, ecs150 Spring 2006 05/31/2006ecs150, spring 20061 Operating System ecs150 Spring 2006 : Operating System #7: mbuf (Chapter 11) Dr. S. Felix Wu

05/31/2006 ecs150, spring 2006 73

UCDavis, ecs150Spring 2006

Client computer Server computer

Applicationprogram

Applicationprogram

Client module

Flat file service

Directory service

LookupAddNameUnNameGetNames

ReadWriteCreateDeleteGetAttributesSetAttributes

*