file system - web.cse.ohio-state.eduweb.cse.ohio-state.edu/~babic.1/b.2s.pdf · unix/linux file...
TRANSCRIPT
1
01/13/2020
File SystemPresentation B
CSE 2431/5421: Introduction to Operating Systems
Gojko BabićStudy: 13.1–13.4, 2.1–2.4, 12.3.4
g. babic Presentation B 2
Name – this information kept in human-readable form.
Type – needed for systems that support different types.
Location – pointers to file location on device.
Size – current file size.
Protection – controls who can do reading, writing,
executing.
Time and date – data for protection, security, and usage
monitoring.
Information about files (i.e. file attributes) are kept in the
directory structure, which is maintained on the disk.
File Attributes
2
g. babic Presentation B 3
A collection of nodes containing information (i.e. file attributes) about a set of files.
F 1 F 2F 3
F 4
F n
Directory
Files
• Both the directory structure and the files reside on disk.
Directory Structure
g. babic Presentation B 4
A Typical File-System Organization
3
Presentation B 5
File Operations:CreateWriteReadReposition within fileDeleteOpen(file_name) – search the directory structure on disk for entry
file_name, and move the content of entry to memory.Close
Access Methods:Sequential Access (based on a tape model of a file):
– read next– write next – reset
Direct (or Relative) Access– position to n (n = relative block number)– read next– write next
File Operations & Access Methods
g. babic Presentation B 6
Modes of file access: read, write, execute Three classes of users
R W X
─ owner access (3 bits used) e.g. 7 1 1 1R W X
─ group access (3 bits used) e.g. 5 1 0 1R W X
─ public access (3 bits used) e.g. 1 0 0 1 File owner is able to control what can be done and by whom. Unix command to set above access rights (i.e. owner: read,
write & execute, group: read & execute, world: execute) for the file game:
chmod 751 game System administrator creates a group (with system unique
name), say G, and add some users to the group. Unix command to attach a group G to a file game:
chgrp G game
Unix/Linux: File Protection
4
g. babic Presentation B 7
In order to use a file, you first need to ask for it by name. This is
called opening the file. The open system call creates an operating
system object called an open file. The open file is logically
connected to the file you named in the open system call. An open
file has a file location associated with it and that is the offset in
the file where the next read or write will start.
After you open a file, you can use the read or write system calls
to read or write a number of characters. Each read or write
system call increments a file location for a number of characters
read or written.
Unix/Linux File System
g. babic Presentation B 8
Thus, the file is read (or written) sequentially by default.
The lseek system call is used to achieve random access into
the file since it changes the file location that will be used for
the next read or write.
To create new file, you use the creat system call.
You close the open file using the close system call, when you
are done using it.
You delete the file from a directory using the unlink system
call.
Unix/Linux File System (cont.)
5
g. babic Presentation B 9
System Call Parameters Returns Notes
open name, flags fid Connect to open file
creat name, mode fid Creates file and
connect to open file
read fid, buffer, count count Reads bytes from
open file
write fid, buffer, count count Writes bytes to open
file
lseek fid, offset, mode offset Moves to position of
next read or write
close fid code Disconnect open file
unlink name code Delete named file
Unix/Linux File System Calls
g. babic Presentation B 10
/* This program reads first 10 characters from the existing file ABC into the array buffer.*/
void main()
{ int y, x, z;
char buffer[100];
y = open ("ABC", 0);
if (y<0) {printf(“error with open”); return 0;}
x = read (y, buffer, 10);
if (x<0) {printf(“error with read”); return 0;}
z=close (y);
if (z<0) {printf(“error with close”); return 0;}
printf(“done”);
return 0;}
A Simple Program with File System Calls
6
g. babic Presentation B 11
Note: Processing in kernel mode may initiate some disk I/O
Processing of Open System Call
g. babic Presentation B 12
C program invoking printf() library call, which calls write() system call
Standard C Library Example
7
#include <stdio.h>
int main(){printf("hello World");return 0;}
.file "hello.c".LC0:.string "hello World".text.globl main.type main, @function
main:subq $8, %rspmovq $.LC0, %rdimovq $0, %raxcall printfmovq $0, %raxaddq $8, %rspret
13
Compiling hello.c
Compilation command: gcc -O1 –S hello.cgenerated this x86-64 assembly code (with minor changes in red):
g. babic Presentation K
Source: Bryant & O’Hallaron: “Computer Systems, 2nd edition”
System calls are provided on IA32 via a exception causing instruction int n, where n can be 0-255, although historically system calls are provided through exception 128 (0x80).
By convention, register %eax contains the system call number, and registers %ebx, %ecx, %edx, %esi, %edi and %ebp contain up to 6 arguments.
Examples of system call numbers:
exit: 1
fork 2
read 3
write 4
open: 5g. babic Presentation B 14
Linux/IA32 System Calls
close: 6 wait: 7 creat: 8 unlink: 10 execve: 11 lseek: 19
getpid: 20 kill: 37 pipe: 42 umask: 60 dup2: 63 gettimeofday: 78
8
.section .data
string: .ascii "hello world“
string_end:
.equ len, string_end - string
.section .text
.globl main
main:
# system call: write(1, "hello, world\n", 11)
movl $4, %eax # System call number 4
movl $1, %ebx # stdout has descriptor 1
movl $string, %ecx # “hello world” string
movl $len, %edx # String length
int $0x80 # System call code
movl $0, %eax
ret
int main() {write (1,“hello world”,11);return 0}/*This is a version of the familiar hello program*/
On the right, we have an implementation of the helloprogram directly with Linuxsystem calls.
g. babic 15
Linux/IA32 System Calls
• Implementing system calls requires a control transfer which involves some sort of architecture-specific feature. A typical way to implement this is to use a software interrupt or trap. Interrupts transfer control to the operating system kernel so software simply needs to set up some register with the system call number needed, and execute the software interrupt.
• For many RISC processors this is the only technique provided, but CISC architectures such as x86 support additional techniques. One example is SYSCALL/SYSRET, SYSENTER/SYSEXIT. These are "fast" control transfer instructions that are designed to quickly transfer control to the OS for a system call without the overhead of an interrupt.
• Linux 2.5 began using this on the x86, where available; formerly it used the INT instruction, where the system call number was placed in the EAX register before interrupt 0x80 was executed.
Wikipedia: System Call Typical Implementation
Presentation B 16g. babic
9
g. babic Presentation B 17
For each process, a set of open file identifiers numbered 0, 1, 2, and so on can be issued for I/O transactions between the process and the Unix/Linux operating system. The first three open file identifiers automatically are assigned I/O channels for a new process when it is created:– Open file identifier 0 (called the standard input) is connected to
the keyboard for input
– Open file identifier 1 (called the standard output) is connected to the terminal for output,
– Open file identifier 2 (called the standard error) is connected to the terminal for output.
Most Unix/Linux commands receive input from the standard input, produces output to the standard output, and send error messages through the standard error. However, the standard I/O channels of a command can be redirected.
Unix/Linux Standard Input and Output
Disks hold enormous amount of data – on the order of hundreds to thousands of gigabytes compared to hundreds to thousands of megabytes in memory.
Disks are slower than RAM-based memory – on the order of milliseconds to read information on a disk.
18Presentation H
SpindleArm
Actuator
Platters
SCSIconnector
Electronics(including a processor and memory!)
Disc Storage
10
• Disks consist of platters, each with two surfaces.
• Each surface consists of concentric rings called tracks.
• Each track consists of sectors separated by gaps.
Spindle
SurfaceTracks
Track k
Sectors
Gaps
19Presentation B
Disc Geometry
Capacity defined to be the maximum number of bits that can be recorded on a disk. Determined by the following factors:
• Recording density (bits/in): The number of bits on a 1-inch segment of a track.
• Track density (tracks/in): The number of tracks on a 1-inch segment of radius extending from the center of the platter.
• Areal density (bits/in2): product of recording density and track density
20Presentation B
Determination of areal density:• Original disks partitioned every track into the same number of
sectors, which was determined by the innermost track. Resulted in sectors being spaced further apart on outer tracks.
• Modern disks partition into disjoint subsets called recording zones.
• Each track within zone same number of sectors, determined by the innermost track.
• Each zone has a different number of sectors/track.
Disc Capacity
11
Capacity = (#bytes/sector) x (avg #sectors/track) x (#tracks/surface) x (#surfaces/platter) x (#platters/disk)
Example: • 512 bytes/sector
• Average of 300 sectors/track
• 20,000 tracks/surface
• 2 surfaces/platter
• 5 platters/disk
Capacity = 512 x 300 x 20,000 x 2 x 5 = 30,720,000,000 = 30.72 GB.
21Presentation B
Computing Disc Capacity
The disk surface spins at a fixedrotational rate.Rotation is counter-clockwise
By moving radially, the arm can position the read/write head over any track.
The read/write headis attached to the endof the arm and flies overthe disk surface on
a thin cushion of air.
spindle
spindle
spin
dle
spindlespindle
22Presentation B
Disc Operations (Single-Platter View)
12
After BLUE read Seek for RED Rotational latency After RED read
Seek Rotational latency
Data transfer23
Complete read of red sector
Presentation B
Disc Access – Service Time Components
Average access time for a sector:
Taccess = Tavg seek + Tavg rotation + Tavg transfer
Seek time (Tavg seek):• Time to position heads over cylinder
• Typical Tavg seek is 3 – 9 ms, max can be as high as 20 ms
Rotational latency (Tavg rotation):• Once head is positioned over track, the time it takes for the first bit of
the sector to pass under the head.
• In the worst case, the head just misses the sector and waits for the disk to make a full rotation.
Tmax rotation = (1/RPM) x (60secs/1min)
• Average case is ½ of worst case:Tavg rotation = (1/2) x (1/RPM) x (60secs/1min)
• Typical rotation speed = 7200 RPMs.
24Presentation B
Calculating Access Time
13
Transfer time (Tavg transfer):• Time to read bits in the sector
• Time depends on the rotational speed and the number of sectors per track.
• Estimate of the average transfer time;• Tavg transfer = (1/RPM) x (1/(avg #sectors/tracks)) x (60 secs/1min)
Example:• Rotational rate = 7200 RPM
• Average seek time = 9 ms
• Avg #sectors/track = 400
Tavg rotation = 1/2 x (60 secs/7200 RPM) x (1000 ms/sec) ≈ 4 ms
Tavg transfer = (60/7200RPM) x (1/400secs/track) x (1000ms/sec) ≈ 0.02 ms
Taccess = 9 ms + 4ms + 0.02 ms
25Presentation B
Calculating Access Time (cont.)
Time to access the 512 bytes in a disk sector is dominated by the seek time (9 ms) and rotational latency (4 ms).
Accessing the sector takes a long time but transferring bits are basically free.
Since seek time and rotational latency are roughly the same, at least same order of magnitude, doubling the seek time is a reasonable estimate for access time.
Comparison of access times of various storage devices when reading a comparable 512-byte sector sized block:
• SRAM 256 ns
• DRAM 5000 ns
• Disk 10 ms
• Disk is about 40,000 times slower than SRAM, 2500 times slower than DRAM.
26Presentation B
Access Time
14
Presentation B 27
After receiving a read system call from the given process (program) to read, e.g. 50 characters from the already open file, operating system maps read request to the appropriate block number.
Here are steps in performing a read disc operation:
1. operating system provides I/O disc controller with: memory buffer address, block number, type of operation: read (or write)
2. now while CPU executes code of some other process (and while the read issuing process is blocked, i.e. it can not run), the disc I/O controller maps a block number to a sector address (a surface #, a track # and sector #):
a. positions a head over the appropriate track (seek time)
b. waits for desired sector to rotate to the head (rotational latency)
Processing of Read File System Call
g. babic Presentation B 28
c. transfers 512 bytes of data (i.e. one sector) from disc to the
local controller memory, and then by DMA those characters
are copied into the appropriate buffer in the main memory
3. When done, the disc controller sends a hardware interrupt to
CPU.
The typical time to perform disc I/O operation is 10-20 millisec.
CPU is interrupted from running some process, and operating
system takes 50 characters from the memory buffer and copies
them into the read issuing process address space. The read
issuing process is now eligible for running.
Processing of Typical File System Call
15
g. babic Presentation B 29
Assume that it takes 20 milliseconds and 25 milliseconds to perform read and write disc operation, respectively, and that it takes 1 millisecond and 0.5 millisecond for operating system to process a system call and hardware interrupt, respectively. Also, assume that the given file is open, that the given process is the only active process in the system and that no error happens during the execution of system calls or I/O operations.
Then, try to estimate a duration of a time period which starts when the process issues the following systems call:
x = read (fd, ch, 20)
and ends when process starts the execution of the instruction that follows. Since there are no other processes in the system, this is a period during which the issuing process is blocked.
Estimating Time Process Is Blocked
g. babic Presentation B 30
The usual (normal) time =
The maximum time =
The minimum time =
1 + 20 + .5 = 21.5 millisecond
1 + 20 + .5 + 20 +.5 = 42 millisecond
1 millisecond
Estimating Time Process Is Blocked (cont.)
• The period we are calculating includes times for:
– operating system processing of hardware interrupts
and system call
– I/O controller performing I/O operation(s)
– hardware processing of interrupts (exceptions) and
exception of interrupt causing instruction (both
assumed to be negligible for this calculation)
16
g. babic Presentation B 31
• Let us try to estimate a duration of a time period needed for O.S. to process this system call:
x = write (fd, ch, 20)
Estimating Time For Processing Write Sys Call
The usual (normal) time =
The maximum time =
1 + 20 + .5 + 25 + .5 = 47 millisecond
1 + 20 + .5 + 25 + .5 + 20 +.5 + 25 + .5= 93 millisecond
• How long will issuing process be blocked?• Note: This time is much more difficult to estimate than a time for
read system call, since O.S may not wait for write disc operation to finish before it unblocks the issuing process.
g. babic Presentation B 32
Search for a file
Delete a file
List a directory
Rename a file
Traverse the file system
Organize (logically) directory for:
– Efficiency: locating a file quickly.
– Naming: to be convenient to users.1. Two users can have same name for different files.
2. The same file can have several different names.
– Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)
Operations Performed on Directory
17
g. babic Presentation B 33
A single-level directory for all users.
• Naming problem
• Grouping problem
Single-Level Directory
g. babic Presentation B 34
Separate directory for each user.
• Path name• Can have the same file name for different user• Efficient searching• But no grouping capability
Two-Level Directory
18
g. babic Presentation B 35
Tree-Structured Directories
Efficient searching
Grouping Capability
Concept of current directory (working directory)
g. babic Presentation B 36
Have shared subdirectories and files.
Acyclic-Graph Directories
Two different names (aliasing) for the same file.
Unlink file = Delete file entry from a directory
File is deleted when a reference count reaches zero
19
g. babic Presentation B 37
General Graph Directories
• Unix avoids cycles by prohibiting multiple reference to directories.