unit 7 file systems reading: - text : 6.1.2 & 6.1.3 - file systems section from any new book on...
TRANSCRIPT
Unit 7
File Systems
Reading:- Text : 6.1.2 & 6.1.3-File Systems section from any new book on Operating Systems (like Tanenbaum's in course reference books)
Original slides by Patrice Belleville ; Changes by George Tsiknis
Unit Outline
Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)
Files and Directories File System implementation and layout
ISO 9660 (CD-ROMs) MS-DOS Linux
Virtual File Systems Robustness and recovery
Unit 7 2
Main Memory vs Disk Differences between memory and disk
access• memory location are accessible individually.• data on disk can only be accessed one chunk at a time
– block size is typically between 512b and 8Kb. naming
• variables are accessed using their address.• data on disk is normally accessed through a file name• the OS translates the name and offset into a logical
block number• the disk controller maps that number to the location on
disk
Unit 7 3
A Disk DriveSpindle
Arm
Actuator
Platters
Electronics(including a processor and memory!)
SCSIconnector
Image courtesy of Seagate TechnologyUnit 7 4
Disk Structure
Hard disks: platter view (from the side)
Unit 7 5
Surface 0
Surface 1Surface 2
Surface 3Surface 4
Surface 5
Cylinder k
Spindle
Platter 0
Platter 1
Platter 2
Disks Operation What affects the time needed to retrieve data from a
hard disk? Seek Time: Time to position the arm on the right track
Tavg seek ~ 9 ms Rotational Latency: Time to position head at the right sector
Tavg rotation = ½ * 1/RPM * 60 secs Average Transfer Time : time to transfer a sector
Tavg transfer = 1/RPM * 1/<avg # sectors per track> * 60 secs Then Taccess = Tavg seek + Tavg rotation + Tavg transfer
Example: Disk with: 15000 RPM, 10ms avg seek and 500 sectors/track. Taccess =Unit 7 8
Logical Disk Blocks
Modern disks present a simpler abstract view of the complex sector geometry: The set of available sectors is modeled as a sequence of b-
sized logical blocks (0, 1, 2, ...) Mapping between logical blocks and actual (physical)
sectors Maintained by hardware/firmware device called disk
controller. Converts requests for logical blocks into
(surface,track,sector) triples.
Unit 7 9
Accessing Disk: Direct Memory Access (DMA)
Disk controller transfers data to/from main memory independently of CPU
Process initiated by CPU using PIO
• send request to controller with addresses and sizes Data transferred to memory without CPU involvement Controller signals CPU with interrupt when transfer complete
Can transfer large amounts of data with one request
1: PIO data transfer CPU -> Controller initiated by CPU
1: PIO data transfer CPU -> Controller initiated by CPU
2: DMA data transfer Controller <-> Memory initiated by Controller
2: DMA data transfer Controller <-> Memory initiated by Controller3: Interrupt
control transfer Controller -> CPU initiated by Controller
3: Interrupt control transfer Controller -> CPU initiated by Controller
Unit 13 10
Unit Outline
Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)
Files and Directories File System implementation and layout
ISO 9660 (CD-ROMs) MS-DOS Linux
Virtual File Systems Robustness and recovery
Unit 7 11
Solid State Disks (SSDs)
Used in USB sticks, digital cameras, iPods, etc. Pages: 512KB to 4KB, Blocks: 32 to 128 pages Data read/written in units of pages. Page can be written only after its block has been erased A block wears out after 100,000 repeated writes.
Flash translation layer
I/O bus
Page 0 Page 1 Page P-1…Block 0
… Page 0 Page 1 Page P-1…Block B-1
Flash memory
Solid State Disk (SSD)
Requests to read and write logical disk blocks
Unit 7 12
SSD Performance Characteristics
Why are random writes so slow? Need to erase a block (takes around 1 ms) Must copy of all useful pages in the block
• Find a used block (new block) and erase it• Write the page into the new block• Copy other pages from old block to the new block
Sequential read tput 250 MB/s Sequential write tput 170 MB/sRandom read tput 140 MB/s Random write tput 14 MB/sRand read access 30 us Random write access 300 us
Unit 7 13
SSD Tradeoffs vs Rotating Disks Advantages
No moving parts faster, less power
Disadvantages Have the potential to wear out
• Mitigated by “wear leveling logic” in flash translation layer• E.g. Intel X25 guarantees 1 petabyte (1015 bytes) of random
writes before they wear out In 2010, about 100 times more expensive per byte
Applications MP3 players, smart phones, laptops Beginning to appear in desktops and servers
Unit 7 14
Unit Outline
Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)
Files and Directories File System implementation and layout
ISO 9660 (CD-ROMs) MS-DOS Linux
Virtual File Systems Robustness and recovery
Unit 7 15
File System Issues
What issues are relevant to the design of a file system? How files are named. Where information about a file is stored. How to find a file's data, given its name. How space for new files is allocated. How to recover from hardware and software failures.
Unit 7 16
Files
In both Windows and Unix, a file is a sequence of bytes. very flexible
These bytes are given meaning by user programs. How do we determine the type of data in a file?
Using the file name (e.g. file extension in Windows) By looking at the first few bytes (e.g. Unix)
Attributes are associated with each file: These vary depending on the operating system.
Unit 7 17
Common File Attributes
File size File owner and group. Location of the file's data Time of creation/last access/last update File permissions (who can read/write/execute it) Assorted flags (hidden/system/archive/lock/etc)
Unit 7 18
File Names A file is accessed using its name. Rules for names depend on the operating system
MS-DOS/Windows up to Windows ME (1981)• 8 ASCII characters, followed by “.” and 3 characters
extension.• Case insensitive (that is, MYFILE.DOC is same as
myfile.doc) ISO 9660 CD-Rom (1988)
• Same as for MS-DOS.• Design goal was to support the lowest common
denominator• Extensions allow file names for Windows NT to 7, and
Unix/LinuxUnit 7 19
File Names (cont')
Windows NT to 7 (1993) 255 Unicode characters, case sensitive (can be switched
off). Many Windows tools are case insensitive!
Unix/Linux 255 ASCII characters (except NULL and /), case sensitive UTF-8 can be used with recent versions of Linux.
Unit 7 20
Directories
A directory is just a file whose data contains a list of entries.
Each entry contains information about one file or directory.
Each file or directory is an entry in some directory, except for the top-level directory.
Unit 7 21
Unit Outline
Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)
Files and Directories File System implementation and layout
ISO 9660 (CD-ROMs) MS-DOS Linux
Virtual File Systems Robustness and recovery
Unit 7 22
ISO9660 CD-ROM File System CD-ROMs are read-only. Consists of a sequence blocks of
2048 data bytes. The file system layout is made simpler. Files are stored using contiguous blocks.
A CD-ROM contains : 16 blocks with various info set by the manufacturer a primary volume descriptor block containing the root directory
Directory entry
Unit 7 23
1 1 8 8 7 1 2 4 1 4 – 15 ? ?
Directory Entry length
Extended attributes record lengthFlags
Name length
Location Size Dt/Tm CD# Name; version
bytes
Unit Outline
Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)
Files and Directories File System implementation and layout
ISO 9660 (CD-ROMs) MS-DOS Linux
Virtual File Systems Robustness and recovery
Unit 7 24
MS-DOS File System
No longer used normally with computers, but in Most digital cameras MP3 players iPods (unless reformatted differently).
Directory entry (32 bytes)
Unit 7 25
8 3 1 10 2 2 2 4
ExtensionAttributes
TimeDate
First cluster #
File Name Unused Size
MS-DOS File System (cont') Space is managed using a
File Allocation Table (FAT) Each block (called cluster)
represented by a 12, 16 or 32-bit word.
A word contains the number of the next block in file.
• In other words: each file is a linked list of blocks.
Two (usually) copies of the FAT are stored on disk.
A copy is always kept in memory.
Unit 7 26
MS-DOS File System (cont')
Pros: ______________________________
Cons: the FAT table takes a lot of memory space
random access to large files is __________
fragmentation can occur frequently• blocks of some file a are scattered all over the disk
Unit 7 27
MS-DOS File System (cont')
Fragmentation example
Unit 7 28
25 26 27 28 29 30 31 32 33 34 35 *
38 39 40 41 42 43 *
0
12
24
36
Initial State: 5 files (sizes are 6, 6, 12, 12, 7 blocks)
7*54
Step 1: deleting the green file
15141413
*111098321
*232221201918171613 15 16 17 18 *
4645
44232220 21
*
21 *
*47
Step 2: creating a new 7-block fileStep 3: creating a new 8-block fileStep 4: creating a new 1-block fileStep 5: deleting the blue fileStep 6: appending 4 blocks to the gray file.
Unit Outline
Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)
Files and Directories File System implementation and layout
ISO 9660 (CD-ROMs) MS-DOS Linux
Virtual File Systems Robustness and recovery
Unit 7 29
Linux File System Overall disk structure
Super blocks contain information about the file system. Each group block contains a copy of its superblock, so if one dies the
information can be recovered. Information about free/occupied blocks is kept separate from the
information used to locate data.Unit 7 30
Group Block 0 Group Block 1 ... Group Block n-1 Group Block n
Super Block
Group Attributes
Block Bitmap
Inode Bitmap
Inode Table
Data Blocks
Linux File System (cont')
Example of a superblock:
Unit 7 31
Filesystem OS type Linux
Inode count: 8060928
Block count: 16113187
Reserved block count: 805659
Free blocks: 15164036
Free inodes: 8021502
First block: 0
Block size: 4096
Blocks per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
First inode: 11
Inode size: 128
Linux File Structure
A file consists of An Inode
• Contains the file's attributes (but not its name).• Contains direct and indirect pointers to data blocks.• A disk block contains multiple Inodes.
Indirect blocks• These contain pointers to data blocks, or to other
indirect blocks. Data blocks
Unit 7 32
Linux File Structure
inode:
Unit 7 33
Type/Permissions
Owner info
File size
Timestamps
Data Blocks # (12)
Indirect Block #
2-indirect Block #
3-indirect Block #
Data Block
Data Block Data Block
Data Block
3-indirect Block
2-indirect Block
Indirect Block Indirect Block Indirect Block
2-indirect Block
2-indirect Block
...
...
...
... ...
...
...
Data Block
Data Block
...
Linux Directories A directory contains entries of other directories or
files. A directory entry consists of
the file name, and the Inode number for the file.
The directory contains no other information. The first entry of every directory is . : a reference to
the directory itself. The second entry of every directory is .. : a reference
to the parent directory.
Unit 7 34
Sharing Files in Linux It is possible for several directory entries to refer to
the same Inode. This is called a hard link. This is the case for . and .. Hard Links can be used to give a program several names
• Example:
all three entries refer to the same inode 1308482. Can be used to share files
All files must belong to the same file system• Why?
Unit 7 35
%ls -ali /bin1308482 -rwxr-xr-x 3 root root 31112 2010-09-11 06:48 bunzip2*1308482 -rwxr-xr-x 3 root root 31112 2010-09-11 06:48 bzcat*1308482 -rwxr-xr-x 3 root root 31112 2010-09-11 06:48 bzip2*
Sharing Files in Linux (cont')
Unix/Linux also support symbolic (soft) links A file f whose contents is the name of another file. Example:
The second file may be on a different file system.
Unit 7 36
%ls -al /lib-rw-r--r-- 1 root root 534832 2010-10-21 19:02 libm-2.12.1.solrwxrwxrwx 1 root root 14 2010-10-22 18:42 libm.so.6 -> libm-2.12.1.so
Reading Data
To read data from a file myfile.txt Find the directory containing myfile.txt. Read the inode for file myfile.txt. Read the data
• either by accessing the direct blocks.• or by going through up to 3 layers of indirect blocks.
Random access to large files is much faster than for the MS-DOS file system.
Unit 7 37
Fragmentation
Unlike the MS-DOS file system, modern file systems (NTFS, etc.) try to keep files together.
For linux: files are kept within a block group if possible. large files are written to large free areas, whereas small files
are stored in smaller free areas. Fragmentation still happens, but much more slowly,
and normally only becomes a problem if the file system is very full.
Unit 7 38
Unit Outline
Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)
Files and Directories File System implementation and layout
ISO 9660 (CD-ROMs) MS-DOS Linux
Virtual File Systems Robustness and recovery
Unit 7 39
Virtual File Systems How do we handle multiple disks, devices or
partitions of one disk with possibly different file systems (i.e. NTFS, FAT32, CD-ROM, etc.)?
MS-DOS, Windows Each disk is assigned a letter name
• A:\ : floppy disk• C:\ : primary hard disk• Z:\ : drive on a server somewhere on the network
This letter is used to decide which file system to pass the request to.
Hence the user must know which file system contains the file he/she wants to access.
Unit 7 40
Virtual File Systems
Unix/Linux There is a root file system / at the top of the hierarchy. Every other file system appears as a subdirectory in that file
system.• Example:
– # ls /mnt/cdrom– # mount -t iso9660 /dev/cdrom /mnt/cdrom
mount: block device /dev/sr0 is write-protected, mounting read-only
– # ls /mnt/cdrom– Autorun.arn Autorun.exe Autorun.inf docs forms
ReadMe.txt The user need not even be aware that multiple file systems
are involved.Unit 7 41
Virtual File Systems
How this is done: User programs make system calls to access various
operations. A layer called the Virtual File System (VFS) performs the
parts of the operations that are common to all file systems. The virtual file system calls low-level functions to
accomplish specific tasks. Each file system must implement these low-level functions
appropriately.
Unit 7 42
Virtual File Systems
Pictorially:
Unit 7 43
User program 1 User program 2 User program 1...
Virtual File System
ISO 9660 F. S. Ext4 F. S. VFAT F. S. ...
Unit Outline
Disks characteristics Rotating disks (6.1.2) Solid state disks (6.1.3)
Files and Directories File System implementation and layout
ISO 9660 (CD-ROMs) MS-DOS Linux
Virtual File Systems Robustness and recovery
Unit 7 44
Robustness and Recovery
File systems contain critical information. Events occur that may cause updates to fail:
Operating system crash (caused by a bug). Mechanical/Electrical failures of the disk. Power failures.
Consequences: The information about to be written may be lost. The file system may become inconsistent.
• There is a risk of losing other information
Unit 7 45
File System Consistency Check
When the operating system shuts down: It saves a file-system-is-clean bit to disk.
During the boot process The operating system checks this bit. If it's not set, then the file system may be in an inconsistent
state So it needs to fix it.
Unit 7 46
File System Consistency Check Example: Linux file system check (e2fsck) Works in 5 stages
Stage 1: reads the inodes and determines• which inodes are in use• the type of file each inode is used for• whether blocks are in use or free• which blocks contain directories• which blocks are used by fewer or more than 1 inode.
Stage 2: verifies that directory entries are valid• all of the fields must have sensible values.• entries for . and .. should be present.
Unit 7 47
File System Consistency Check Stage 3: checks the directory structure
• It must form a tree• So reconnect disconnected pieces, and break any loop.
Stage 4: check and correct reference counts• Multiple directory entries can point to the same inode• The inode keeps track of this number
– Why?
• Make sure the reference count in the Inode is correct. Stage 5: check bitmaps.
• compare block and inode bitmaps against on-disk bitmaps
• Update these if necessary.Unit 7 48
File System Recovery
File System recovery Takes a long time for large file systems. Does not always restore the file system perfectly.
How do databases handle this problem? They log the transactions being performed. If a transaction is interrupted, it can be undone or redone
by executing the logged operations. Some file systems do the same thing.
Unit 7 49
Journaling File Systems A journaling file system has a hidden file called a
journal (NTFS, Linux ext3). Each operation is broken down into atomic steps.
Example: to delete a file Free each data block. Decrement the Inode's reference count (free it if it becomes 0). Remove the directory entry for the file.
Before performing the operations Write the sequence of steps to the journal. Add an end-of-operation indicator to the journal.
After the operation completes The steps can be deleted from the journal.
Unit 7 50
Journaling File Systems Each step must be idempotent
That is, executing the step multiple times should have the same effect as executing it only one.
Why? Examples (good or bad?):
• Increment reference count for inode #786453• Set reference count for inode #786453 to 2• Mark block #9168734 free
When the file system isn't clean on reboot Replay every operation from the journal that has an end-of-
operation indicator. This is much faster than a full check.
Unit 7 51