computer science 213 © 2006 donald acton 244 the role of unix i/o file system works at the block...
Post on 28-Dec-2015
215 Views
Preview:
TRANSCRIPT
1Computer Science 213© 2006 Donald Acton
The Role of Unix I/OThe Role of Unix I/O
• File system works at the block level
• Applications work at the byte level
• Unix I/O converts the byte level access to block level operations
Application
Unix I/O
File System
Disk Drive
File System
Layering
2Computer Science 213© 2006 Donald Acton
Unix I/O APIUnix I/O API
• Some of the most common Unix I/O API functions used by applications are:– open()– close()– read()– write()– lseek()
3Computer Science 213© 2006 Donald Acton
Opening FilesOpening Files
• Opening a file informs the kernel that an application wants to access a file
• Allows the kernel to set aside resourcesint source_fd;
if ((source_fd = open(argv[1], O_RDONLY)) < 0) {
perror("Open source failed:");
exit(2);
}
4Computer Science 213© 2006 Donald Acton
Opening cont’dOpening cont’d
• Open returns a small integer called a file descriptor
• Application passes this value back to the kernel in subsequent requests to work with a file
• Each process created starts with three open files:– 0: standard input (stdin)– 1: standard output (stdout)– 2: standard error (stderr)
5Computer Science 213© 2006 Donald Acton
Closing FilesClosing Files
• Closing a file tells the kernel it may free resources associated with managing the file
int rc;
if ((rc = close(source_fd)) < 0){
perror("close");
exit(10);
}
6Computer Science 213© 2006 Donald Acton
Reading FilesReading Files
• Each open file has a notion of a current position in the stream of bytes
• read() copies bytes from the current file position to memory and updates the file position
• read() returns the number of bytes read– If bytes read < 0 – read may return fewer bytes than requested
(short reads)
error
7Computer Science 213© 2006 Donald Acton
Read ExampleRead Examplechar buf[512];
int chars_read;
chars_read = read(source_fd, buf, sizeof(buf));
while (chars_read > 0) {
// Do something
chars_read = read(source_fd, buf,
sizeof(buf));
}
if (chars_read < 0) {
perror("Reading error:");
exit(5);
}
8Computer Science 213© 2006 Donald Acton
Writing FilesWriting Files
• Writing copies bytes from memory to the file position and updates position
• Returns the number of bytes written• If bytes written < 0 • It is possible that fewer bytes were
written than requested (short writes) this is not an error, but certainly a challenge to deal with
error
9Computer Science 213© 2006 Donald Acton
Writing ExampleWriting Example
while (chars_read > 0) {
if (write(stdout, buf,
chars_read) < chars_read) {
perror("Write problems:");
exit(4);
}
// Do another read and work
}
10Computer Science 213© 2006 Donald Acton
SeekSeek
• Causes the logical position in the file to change (i.e. where the next read or write will commence from)
• Position can be changed – To absolute offset in file– Relative to the current location– Relative to the end of the file
11Computer Science 213© 2006 Donald Acton
Seek exampleSeek example
long new_offset;
new_offset = lseek(fd, 2346, SEEK_CUR);
new_offset = lseek(fd, 10, SEEK_SET);
new_offset = lseek(fd, 25, SEEK_END);
12Computer Science 213© 2006 Donald Acton
Unix I/O ExampleUnix I/O Example
• Simple program that copies contents of file named by argument 1 to file named by argument 2 (i.e. the cp command)
cs213copy fname1 [fname2]
13Computer Science 213© 2006 Donald Acton
Pseudo CodePseudo Code
open argument 1 for inputopen argument 2 for output (if present)if arg 2 present then connect stdout to this
fileread from inputwhile read succeeds write to stdout read from input
14Computer Science 213© 2006 Donald Acton
Unix I/O Copy CommandUnix I/O Copy Command // Includes
int main(int argc, char **argv) { // Check arguments int source_fd; if ((source_fd = open(argv[1], O_RDONLY)) < 0) { perror("Open source failed:"); exit(2); }
int dest_fd; if (argc > 2) { if ((dest_fd = open(argv[2], O_WRONLY |
O_CREAT, 0600)) < 0) { perror("Destination open failed:");
int rc; if ((rc = close(source_fd)) < 0) {
perror("close");exit(10);
} exit(3); }
dup2(dest_fd, STDOUT_FILENO); }
char buf[512]; int chars_read;
chars_read = read(source_fd, buf, sizeof(buf)); while (chars_read > 0) { if (write(STDOUT_FILENO, buf, chars_read) <
chars_read) { perror("Write problems:"); exit(4); } chars_read = read(source_fd, buf,
sizeof(buf)); } if (chars_read < 0) { perror("Reading error:"); exit(5); }}
15Computer Science 213© 2006 Donald Acton
1) Unix I/O1) Unix I/O#include <sys/types.h>#include <sys/stat.h>#include <fcntl.h>#include <stdio.h>#include <unistd.h>#include <stdlib.h>
int main(int argc, char **argv) {
if (argc <= 1) { printf("Usage: cs213cp source_file [destination_file]\n"); exit(1); }
16Computer Science 213© 2006 Donald Acton
2) Unix I/O2) Unix I/O
int source_fd;
if ((source_fd = open(argv[1],
O_RDONLY)) < 0) {
perror("Open source failed:");
exit(2);
}
17Computer Science 213© 2006 Donald Acton
3) Unix I/O3) Unix I/O int dest_fd; if (argc > 2) { if ((dest_fd = open(argv[2], O_WRONLY | O_CREAT,
0600)) < 0) { perror("Destination open failed:");
int rc; if ((rc = close(source_fd)) < 0) {
perror("close"); exit(10);
} exit(3); } dup2(dest_fd, STDOUT_FILENO); }
18Computer Science 213© 2006 Donald Acton
4) Unix I/O4) Unix I/O char buf[512]; int chars_read;
chars_read = read(source_fd, buf, sizeof(buf)); while (chars_read > 0) { if (write(STDOUT_FILENO, buf, chars_read) <
chars_read) { perror("Write problems:"); exit(4); } chars_read = read(source_fd, buf, sizeof(buf)); } if (chars_read < 0) { perror("Reading error:"); exit(5); }}
19Computer Science 213© 2006 Donald Acton
Unix I/OUnix I/O
• By making everything appear to be a file, the kernel can provide a single simple interface for performing I/O to a variety of devices
• Recall the basic operations are:– Opening and closing files
• open() and close()
– Changing the current file position• lseek()
– Reading and writing files• read() and write()
20Computer Science 213© 2006 Donald Acton
Adding Other DevicesAdding Other Devices
• Most devices tend to be producers or consumers of streams of data and fit UNIX I/O API model described
Mouse producer
Joystick producer
Keyboard producer
Display Consumer
Audio device consumer
Tape both
21Computer Science 213© 2006 Donald Acton
New DevicesNew Devices
Disk
UNIX I/O
Application
File SystemFile System
Disk Drive
Keyboard Terminal Tape Audio
22Computer Science 213© 2006 Donald Acton
Getting data to/from the hardware
Getting data to/from the hardware
•There are 2 main issues to deal with
– buffering of data going to and from the disk– I/O requests that are not block aligned or in block multiples
Application
Unix I/O
File System
Disk Drive
File System
Layering
23Computer Science 213© 2006 Donald Acton
File DescriptorsFile Descriptors
• Calls to routines like open(), socket(), accept() and pipe() return file descriptors
• A file descriptor is just a small integer
• When this “integer” is passed back to the kernel via calls like read() or write() the kernel manipulates the opened “file” the descriptor corresponds to
24Computer Science 213© 2006 Donald Acton
The Kernel’s View of a File Descriptor
The Kernel’s View of a File Descriptor
• Each process has associated with it a fixed size file descriptor table
• The file descriptor is just the index into this table!
• Each active entry in the table identifies an entry in a shared system wide open file table
• Entries are created in the open file table each time open() succeeds
25Computer Science 213© 2006 Donald Acton
Open File TableOpen File Table
• Entries in the open file table identify the I/O target in a v-node table
• Open file table keeps current position and reference count of its usage
• v-node – virtual inode, basically a cache of an inode– may contain pointers to buffers/caches
for the file/device– identifies legal operations on a
file/device
26Computer Science 213© 2006 Donald Acton
The Kernel ViewThe Kernel View
fd 0
fd 1
fd 2
fd 3
fd 4
Descriptor table
(one table
per process)
Open file table
(shared by
all processes)
v-node table
File pos
refcnt=1...
stderr
stdout
stdin File access
...
File size
File type
File A
Adapted from: Computer Systems: A Programmer’s Perspective
The above is one struct in the open file table
27Computer Science 213© 2006 Donald Acton
v-node rolev-node role
UNIX I/O
Application
File SystemFile System
Disk Drive
Keyboard Terminal Tape Audio
28Computer Science 213© 2006 Donald Acton
To the DeviceTo the Device
• Unix I/O uses the open file table and v-node table to determine the “device” specific code for the standard operations (open, close read, write…)
• These routines use buffers identified by the v-node table
• Buffers are caches of on disk blocks• Changes to buffers result in writes
being scheduled
29Computer Science 213© 2006 Donald Acton
write()write()
• lseek(fd, 931, SET_SEEK);– Change file position in open file table to 931
• write(fd, buff, 128);– If block #1 (bytes 512 – 1023) not cached -
read it– If block #2 (bytes 1024 – 1535) not cached -
read it– Change bytes 931- 1023, and 1024-1058– Have blocks 1 and 2 scheduled for writing to
disk
30Computer Science 213© 2006 Donald Acton
read()read()
• lseek(fd, 500, SET_SEEK);– Change file position in open file table to
500• read(fd, buff, 1024);
– If any of blocks 0 (0 – 511), 1 (512-1023) or 2 (1024 – 1535) not cached order them read
– Transfer bytes 500 – 511, 512 – 1023, and 1024 – 1523 to buff when blocks availability
31Computer Science 213© 2006 Donald Acton
Sharing FilesSharing Files
• At this point we have– File descriptors– The open file table– V-nodes
• It is relatively easy to explain what happens when file sharing results from:– Opens in the same process– Opens in different processes– Forks
32Computer Science 213© 2006 Donald Acton
Actions on open()Actions on open()
fd 0
fd 1
fd 2
fd 3
fd 4
Descriptor table
(one table
per process)
Open file table
(shared by
all processes)
v-node table
File pos
refcnt=1...
File pos
refcnt=1
...
stderr
stdout
stdin File access
...
File size
File type
File access
...
File size
File type
File A
File B
fd = open("B",…)
Adapted from: Computer Systems: A Programmer’s Perspective
33Computer Science 213© 2006 Donald Acton
Same File Different ProcessSame File Different ProcessDescriptor table
(one table
per process)
Open file table
(shared by
all processes)
v-node table
File pos
refcnt=1...
File pos
refcnt=1
...
fd 0
fd 1
fd 2
fd 3
fd 4
stderr
stdout
stdin
File access
...
File size
File type
File A
File A
fd = open("A",…)
fd 0
fd 1
fd 2
fd 3
fd 4
stderr
stdout
stdin
Adapted from: Computer Systems: A Programmer’s Perspective
34Computer Science 213© 2006 Donald Acton
Same File Same ProcessSame File Same ProcessDescriptor table
(one table
per process)
Open file table
(shared by
all processes)
v-node table
File pos
refcnt=1...
File pos
refcnt=1
...
fd 0
fd 1
fd 2
fd 3
fd 4
stderr
stdout
stdin
File access
...
File size
File type
File A
File A
fd = open("A",…);
Adapted from: Computer Systems: A Programmer’s Perspective
35Computer Science 213© 2006 Donald Acton
Close()Close()
Empty
fd 0
fd 1
fd 2
fd 3
fd 4
Descriptor table
(one table
per process)
Open file table
(shared by
all processes)
v-node table
(shared by
all processes)
File pos
refcnt=1...
File pos
refcnt=1
...
stderr
stdout
stdin File access
...
File size
File type
File access
...
File size
File type
File A
File B
close(4);refcnt=0
36Computer Science 213© 2006 Donald Acton
I/O RedirectionI/O Redirection
COMOX(114): ls > /tmp/out• The above causes standard output (file descriptor
1) to be set to /tmp/out
fd 0
fd 1
fd 2
fd 3
fd 4
Process file
descriptor table
stderr
stdout
stdin
File pos
refcnt=4
terminalFile access
...
File size
File type
File access
...
File size
File typeFile pos
refcnt=1
.../tmp/out
refcnt=3
...
Adapted from: Computer Systems: A Programmer’s Perspective
37Computer Science 213© 2006 Donald Acton
dup2dup2
• The Unix system call dup2, which has the form dup2(fd, newfd), copies fd to newfd in the descriptor table.
a
b
fd 0
fd 1
fd 2
fd 3
fd 4 b
b
fd 0
fd 1
fd 2
fd 3
fd 4
dup2(4,1)
Adapted from: Computer Systems: A Programmer’s Perspective
38Computer Science 213© 2006 Donald Acton
dup2 exampledup2 example
Process file
descriptor tableFile pos
terminalFile access
...File size
File type
File access
...
File size
File typeFile pos
...
/tmp/out
...
open("/tmp/foo",…);
dup2(4,1);
close(4);
refcnt=1
refcnt=1
fd 0
fd 1
fd 2
fd 3
fd 4
refcnt=0
refcnt=2
41Computer Science 213© 2006 Donald Acton
Application Application
• Given what we know, are there interesting things we can do at the application layer to speed things up?
• Making a system call is several orders of magnitude more expensive than a function call
Application
Unix I/O
File System
Disk Drive
File System
Layering
42Computer Science 213© 2006 Donald Acton
Caching in the ApplicationCaching in the Application
• Applications can use caching to improve performance just like the kernel
• Most I/O has both– Spatial locality– Temporal locality– An application level cache in
the form of the Standard I/O library attempts to take advantage of this
Unix I/O
File System
Disk Drive
File System
Layering
Buffered I/OApplication
43Computer Science 213© 2006 Donald Acton
STDIO (Caching)STDIO (Caching)
• Each Unix I/O call has a corresponding stdio call– open() fopen(), close fclose()– read() fread(), write() fwrite()
• Instead of returning a file descriptor fopen() returns a FILE *
• The FILE struct contains: – actual file descriptor – pointer to a buffer– position in buffer– other bookkeeping information
44Computer Science 213© 2006 Donald Acton
How it works - writesHow it works - writes
• When fwrite() is called bytes are copied to the stream buffer
• If the stream buffer fills during the fwrite()– write() called to “write” the stream
buffer– Stream buffer cleared
45Computer Science 213© 2006 Donald Acton
fwrite()fwrite()
• Buffer• Buffer offset• fd
Kernel boundary write()
Cached File Block Cached File Block
46Computer Science 213© 2006 Donald Acton
How it works - readsHow it works - reads
• When fread() is called bytes are copied from the stream buffer to the application designated location
• If the stream buffer empties during the fread()– read() called to refill the stream buffer– Position in stream buffer reset
47Computer Science 213© 2006 Donald Acton
fread()fread()
• Buffer• Buffer offset• fd
Kernel boundary read()
Cached File Block
48Computer Science 213© 2006 Donald Acton
AnalysisAnalysis
• Costs over doing a system call– Need extra buffer space– One extra set of copies– Bookkeeping to ensure the stream buffer
exactly matches real file location – I/O to random locations can be inefficient
• Advantage over system call– If application I/O requests much less data than
underlying buffer holds then greatly reduces the number of system calls
– System calls are very expensive
49Computer Science 213© 2006 Donald Acton
What are files good for?What are files good for?
• A bulk storage mechanism• A more permanent form of storing
information• A form of interprocess
communication– The mere existence of a file can mean
something– Data in a file can be a message to a
process that doesn’t exist yet
50Computer Science 213© 2006 Donald Acton
Sharing data on diskSharing data on disk
write()
Application 1 Application 2
read()
Hi
Hi
?
51Computer Science 213© 2006 Donald Acton
Two processes, same timeTwo processes, same time
• As the file access times between the two processes narrows just what one process sees relative to the actions of the other becomes unpredictable
• Two common problems– Lost update– Inconsistent retrievals
52Computer Science 213© 2006 Donald Acton
The Lost UpdateThe Lost Update
• Withdraw(A, 4);• Deposit(B, 4);
– Bal = A.read(); 100– A.write(Bal – 4); 96
– Bal = B.read() 200
– B.write(Bal + 4) 204
• Withdraw(C, 3);• Deposit(B, 3);
– Bal = C.read(); 300– C.write(Bal – 3); 297
– Bal = B.read() 200– B.write(Bal + 3) 203
53Computer Science 213© 2006 Donald Acton
Aside - cache consistencyAside - cache consistency
• The previous problem illustrates the issue of cache consistency
• The values read from disk and then used are cached
• Multiple programs cache and change the same data simultaneously without regard for one another
• Result
54Computer Science 213© 2006 Donald Acton
Inconsistent RetrievalsInconsistent Retrievals
• Withdraw(A, 5);• Deposit(B, 5);
– Bal = A.read(); 200– A.write(Bal – 5); 195
– Bal = B.read(); 100– B.write(bal + 5) 105
• TotalAccounts();
– Bal = A.read(); 195
– Bal += B.read() 295
– Bal += C.read() …
55Computer Science 213© 2006 Donald Acton
Are these familiar types of problems?
Are these familiar types of problems?
• How was it solved?• Would the same solution work here?• Would it scale?
59Computer Science 213© 2006 Donald Acton
Lock a file regionLock a file region
int sharedData;Lock aLock;…aLock.acquire (); read or write sharedDataaLock.release ();…
Shared Region
60Computer Science 213© 2006 Donald Acton
lockf()lockf()
lockf(fd, function, size)F_UNLOCK
F_LOCK
F_TLOCK
F_TEST
An open file descriptor that allows writing
Starting from the current file position, the number of bytes to lock
61Computer Science 213© 2006 Donald Acton
Using lockf()Using lockf()
int main(int argc, char **argv) {
int fd = open(argv[1], O_RDWR);
if ((status = lockf(fd, F_TLOCK, 60)) < 0) {
printf("locked\n");
lockf(fd,F_LOCK, 60);
}
}
62Computer Science 213© 2006 Donald Acton
The Lost Update (2)The Lost Update (2)
• Withdraw(A, 4);• Deposit(B, 4);
– Bal = A.read(); 100– A.write(Bal – 4); 96
– Bal = B.read() 200
– B.write(Bal + 4) 204
• Withdraw(C, 3);• Deposit(B, 3);
– Bal = C.read(); 300– C.write(Bal – 3); 297
– Bal = B.read() 200– B.write(Bal + 3) 203
63Computer Science 213© 2006 Donald Acton
Types of lock requestsTypes of lock requests
• Regular lock (really a writer lock)– Only one acquisition allowed at a time
• Read lock– Allows multiple readers to hold the lock at the
same time – increased concurrency– Basically prevents a writer from making
changes
• Write lock– Only one acquisition allowed at a time– Prevents read lock from being acquired
64Computer Science 213© 2006 Donald Acton
Reader – Writer locksReader – Writer locks
int sharedData;Lock aLock;…aLock.acquireWrite ();
write sharedDataaLock.release ();…
aLock.acquireRead ();read sharedData
aLock.release ();…
Shared Region
65Computer Science 213© 2006 Donald Acton
Implementing LocksImplementing Locks
• Each lock requires– Lists of process IDS
• Process with lock• Processes waiting for lock
– Regions – what part of the file is being locked and how (read/write)
66Computer Science 213© 2006 Donald Acton
Where are locks implemented?Where are locks implemented?
• Requirements– Must be (potentially) 1 per file– All processes must be able to locate the
lock– Created on demand (sort of)
• What kernel data structure associated with file management has these properties?
67Computer Science 213© 2006 Donald Acton
Locking and VnodesLocking and VnodesDescriptor table
(one table
per process)
Open file table
(shared by
all processes)
v-node table
File pos
refcnt=1...
File pos
refcnt=1
...
fd 0
fd 1
fd 2
fd 3
fd 4
stderr
stdout
stdin
File access
...
File size
File type
File A
File A
fd = open("A",…)
fd 0
fd 1
fd 2
fd 3
fd 4
stderr
stdout
stdin
Adapted from: Computer Systems: A Programmer’s Perspective
68Computer Science 213© 2006 Donald Acton
Are locks enough?Are locks enough?
• Locks can control concurrency• Sometimes a collection of actions
need to be atomic – Locks can’t ensure this in the face of
failures– Undoing (rolling back) things can be a
challenge
69Computer Science 213© 2006 Donald Acton
Transactions - DefinitionTransactions - Definition• A transaction is a sequence of data
operations with the following properties:– A Atomic – all or nothing– C Consistent - consistent state in =>
consistent state out– I Independent - partial results are
not visible to concurrent transactions– D Durable - once completed, new state
survives crashes
70Computer Science 213© 2006 Donald Acton
Transaction OperationsTransaction Operations
• tid = beginTx() – Start a new transaction and return a
transaction identifier
• status = commitTX(tid)– Cause the transaction to commit– Return success indication if transaction
committed otherwise return failure indication
71Computer Science 213© 2006 Donald Acton
Transaction Operations cont’dTransaction Operations cont’d
• abortTX(tid)– Abort the transaction and cause all files
to take on the values they had before the transaction started
• readTX(tid, file values)– Read the given “values” from a file and
associate the read with the indicated transaction
72Computer Science 213© 2006 Donald Acton
Transaction Operations cont’dTransaction Operations cont’d
• writeTX(tid, values)– Write the given values to the file and
associate the write with the indicated transaction
73Computer Science 213© 2006 Donald Acton
Example transactionExample transaction
tid = beginTX();
readTX(tid, &a, file_to_read_from, …);
readTX(tid, &b, file_to_read_from, …);
perform computations
writeTX(tid, &a, file_to_write_to, ...);
readTX(tid, &c, file_to_read_from, …);
if (error reading) { abortTX(tid); return; }
perform computations
writeTX(tid, &c, file_to_write_to, …)
commitTX(tid);
74Computer Science 213© 2006 Donald Acton
Ensuring AtomicityEnsuring Atomicity
• Problem– ensure all changes get made or none
get made• If no failure, it’s easy
– just do the updates• If failure occurs while updates are
performed must either– Go back to the initial state– Go to the final state
75Computer Science 213© 2006 Donald Acton
StrategyStrategy
• Use another file, called a log file, to record our intentions
• Write information to indicate– That a transaction has started– The new values a file is to have– That a transaction has committed– That a transaction has aborted– The transaction can be truncated
76Computer Science 213© 2006 Donald Acton
LoggingLogging
• Persistent (on disk) log – records information to support recovery and
abort
• Types of log records– begin, update, abort, commit, and truncate
• Atomic update– atomic operation is write of commit record to
disk– transaction committed iff commit record in log
77Computer Science 213© 2006 Donald Acton
Ways to log the “values”Ways to log the “values”
• Value logging– write new value of modified data to log– simple, but not always space efficient or
easy• hard for some things such as malloc and
system calls
• Operation logging– write name of operation and its
arguments– usually used for roll forward logging
79Computer Science 213© 2006 Donald Acton
Logging for Roll ForwardLogging for Roll Forward
• For each transactional update– Change in-memory copy– Write new value to log– Do not change on-disk copy until commit
• Commit– Write commit record to log– Write changed data to disk – Write truncate record to log
• Abort– Write abort record to log– Invalidate in-memory data– Nothing to do with on disk copies
80Computer Science 213© 2006 Donald Acton
Roll forward recoveryRoll forward recovery
• When the system restarts after a failure– use log to roll forward committed
transactions– normal access stopped until recover is
completed
81Computer Science 213© 2006 Donald Acton
Recovery ContinuedRecovery Continued
• Complete committed, but un-truncated transactions– for every trans with a commit but no truncate– read new values from log and update disk
values– write truncate record to log
• Abort all uncommitted trans– for every trans with no commit or abort– write abort record to log
82Computer Science 213© 2006 Donald Acton
Logging/Recover ExampleLogging/Recover Example
• Application Actions– tid = beginTX– ReadTX(tid, &a, …)– ReadTX(tid, &b, …) – WriteTX(tid, &b, …)– WriteTX(tid, &a, …)– commitTX(tid)
• Write out a and b to real file
• Write truncate to log
• Log File Records– BEGIN<1>
– NVAL<1, b, newval>– NVAL<1, a, newval>
– COMMIT<1>
– TRUNC<1>
83Computer Science 213© 2006 Donald Acton
Role of LockingRole of Locking
• Locks must still be acquired to prevent inconsistent retrieval and lost updates
• Upon first time access of a value its source must be locked
• Locks released after all writes to real file completed (or reads if no writes being done)
• Locks are also used on the log file
84Computer Science 213© 2006 Donald Acton
Log FileLog File
• Log file can be shared by different processes
• Writes are always done to the end• Before doing a write, a lock is
acquired and released upon write completion
• Write consists of one or more log records
85Computer Science 213© 2006 Donald Acton
Roll backwards loggingRoll backwards logging
• This is the opposite of redo or roll-forward logging
• Instead of writing new values to the log file old values are written
• Real files are updated before commit is written
• On abort, log is used to restore old values
86Computer Science 213© 2006 Donald Acton
Undo logging - roll backward
Normal operation Undo logging - roll backward
Normal operation • For each transactional update
– write old value to log– modify data and write to disk any time
• Commit– ensure that all updates have been
written to disk– write commit record to log
• Abort– use log to recover disk to old values
87Computer Science 213© 2006 Donald Acton
Undo logging - roll backward
RecoveryUndo logging - roll backward
Recovery• When the system restarts after a failure
– use log to rollback uncommitted transactions– normal access stopped until recovery completed
• Undo effect of any uncommitted transactions – for every trans with no commit or abort use log
to recover disk to old values– write abort record to log
88Computer Science 213© 2006 Donald Acton
Logging/Recover ExampleLogging/Recover Example
• Application Actions– tid = beginTX– ReadTX(tid, &a, …)– ReadTX(tid, &b, …) – WriteTX(tid, &b, …)– WriteTX(tid, &a, …)– commitTX(tid)
• Ensure updated a and b written to real file
• Write commit to log
• Log File Records– BEGIN<1>
– OVAL<1, b, oldval>– OVAL<1, a, oldval>
– COMMIT<1>
89Computer Science 213© 2006 Donald Acton
Outstanding problems?Outstanding problems?
• What about disk write order?– When application writes to disk the
operating system decides write time and order
– This is a problem for transactions
• Keeping the log file from growing infinitely large– Log file truncation
90Computer Science 213© 2006 Donald Acton
fsync()fsync()
• The order of writes is important• For example in redo logging
– All new values must be written to the log file before the commit is written
– All updates to the “real” files need to be onto disk before truncate is written
• fsync(fd) – will not return until all outstanding writes on the file descriptor are complete
91Computer Science 213© 2006 Donald Acton
fsync() cont’dfsync() cont’d
• fsync() does not guarantee that writes go to the disk in program order
• If disk write order is important (e.g. when commit is written) then– Call fsync() before writing commit– Write commit– Call fsync() again
• Could also open file with O_SYNC option
92Computer Science 213© 2006 Donald Acton
Shrinking the Log File (Truncation)
Shrinking the Log File (Truncation)
• Truncation is the process of– removing unneeded records from
transaction log• For redo logging
– remove transactions with truncate or abort records
• For undo logging– Remove transactions with commit or
abort records
93Computer Science 213© 2006 Donald Acton
Layering - revisitedLayering - revisited
• STDIO and transaction systems are layers within the application layer
• Notice that layers don’t have to extend completely across the level they are in
• When using a layer don’t circumvent it – Example - when using STDIO don’t get the
file descriptor and then do your own reads or writes and continue to use the f*() calls
94Computer Science 213© 2006 Donald Acton
Application
Application LayeringApplication Layering
UNIX I/O
File SystemFile System
Disk Drive
Keyboard Terminal Tape Audio
STDIO Transaction System
95Computer Science 213© 2006 Donald Acton
Layering in the File SystemLayering in the File System
• Disks present very similar interfaces but the precise way to control different disk types differ
• To simplify the task of dealing with different disk types the notion of a virtual disk interface is used
• Each time a new type of drive is introduced one simply implements the virtual interface
96Computer Science 213© 2006 Donald Acton
Yet Another LayerYet Another Layer
SCSI ESDI
Virtual Disk Interface
UNIX I/O
File SystemOther
Devices
IDE
Disk Drive
Application
STDIO Transaction System
97Computer Science 213© 2006 Donald Acton
Extending the File SystemExtending the File System
• Layering makes it “easy” to extend the file system architecture provided the various boundaries are well defined
• Example:– Journaling/logging file systems – Network File Systems (NFS)– iSCSI
• Just insert the new service at the appropriate layer
98Computer Science 213© 2006 Donald Acton
File System
Inserting New FunctionalityInserting New Functionality
SCSI IDE
iSCSI
Virtual Disk Interface
Unix FFS Logging FS NFS Client
Network Protocol Stack
UNIX I/O
Application
Virtual Disk Interface
Other Devices
top related