file systems

15
File Systems Topics Topics Design criteria History of file systems Berkeley Fast File System Effect of file systems on programs CS 105 “Tour of the Black Holes of Computing”

Upload: boone

Post on 05-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

File Systems. CS 105 “Tour of the Black Holes of Computing”. Topics Design criteria History of file systems Berkeley Fast File System Effect of file systems on programs. File Systems: Disk Organization. A disk is a sequence of 512-byte sectors - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: File Systems

File SystemsFile Systems

TopicsTopics Design criteria History of file systems Berkeley Fast File System Effect of file systems on programs

CS 105“Tour of the Black Holes of Computing”

Page 2: File Systems

– 2 – CS 105

File Systems: Disk OrganizationFile Systems: Disk Organization

A disk is a sequence of 512-byte sectorsA disk is a sequence of 512-byte sectors Unfortunate historical fact; now we’re stuck with it

First comes First comes boot blockboot block and and partition tablepartition table

Partition table divides the rest of disk into partitionsPartition table divides the rest of disk into partitions May appear to operating system as logical “disks” Useful for multiple OSes, etc. Otherwise bad idea; hangover from earlier days

File systemFile system: partition structured to hold : partition structured to hold filesfiles (data) (data) Usually aggregates sectors into blocks or clusters Typical size: 2-16 sectors (1K-8K bytes) Increases efficiency by reducing overhead

Page 3: File Systems

– 3 – CS 105

Design ProblemsDesign Problems

As seen before, disks have mechanical delaysAs seen before, disks have mechanical delays Seek time: move heads to where data is (2-20 ms) Rotational delay: wait for data to get under heads (2-8 ms) Transfer time: move data into memory (1-15 μs)

Fundamental problem in file-system design: how to Fundamental problem in file-system design: how to hide (or at least minimize) these delays?hide (or at least minimize) these delays?

Side problems also important:Side problems also important: Making things reliable (in face of s/w and h/w failures) Organizing data (e.g., in directories or databases) Enforcing security

Page 4: File Systems

– 4 – CS 105

Important File SystemsImportant File Systems

FAT: old Windows and MSDOS standardFAT: old Windows and MSDOS standard

NTFS: Windows current standardNTFS: Windows current standard

FFS: Unix standard since 80’sFFS: Unix standard since 80’s

AFS: distributed system developed at CMUAFS: distributed system developed at CMU

LFS: Berkeley redesign for high performanceLFS: Berkeley redesign for high performance

ZFS: redesigned Unix system, recently released by SunZFS: redesigned Unix system, recently released by Sun

ISO 9660: CD-ROM standardISO 9660: CD-ROM standard

EXT2/EXT3: Linux standards, variants of FFSEXT2/EXT3: Linux standards, variants of FFS

ReiserFS: redesigned Linux systemReiserFS: redesigned Linux system

Other OS’s have own file organization: VMS, MVS, Other OS’s have own file organization: VMS, MVS,

Page 5: File Systems

– 5 – CS 105

Typical SimilaritiesAmong File SystemsTypical SimilaritiesAmong File SystemsA (secondary) boot recordA (secondary) boot record

A top-level directoryA top-level directory

Support for hierarchical directoriesSupport for hierarchical directories

Management of free and used spaceManagement of free and used space

Metadata about files (e.g., creation date)Metadata about files (e.g., creation date)

Protection and securityProtection and security

Page 6: File Systems

– 6 – CS 105

Typical DifferencesBetween File SystemsTypical DifferencesBetween File SystemsNaming conventions: case, length, special symbolsNaming conventions: case, length, special symbols

File size and placementFile size and placement

SpeedSpeed

Error recoveryError recovery

Metadata detailsMetadata details

Support for special filesSupport for special files

Page 7: File Systems

– 7 – CS 105

Case Study: Berkeley Fast File System (FFS)Case Study: Berkeley Fast File System (FFS)First public Unix (Unix V7) introduced many important First public Unix (Unix V7) introduced many important

concepts in Unix File System (UFS)concepts in Unix File System (UFS) I-nodes Indirect blocks Unix directory structure and permissions system

UFS was simple, elegant, and slowUFS was simple, elegant, and slow

Berkeley initiated project to solve the slownessBerkeley initiated project to solve the slowness

Many modern file systems are direct or indirect Many modern file systems are direct or indirect descendants of FFSdescendants of FFS In particular, EXT2 and EXT3

Page 8: File Systems

– 8 – CS 105

FFS HeadersFFS Headers

Boot block: first few sectorsBoot block: first few sectors Typically all of cylinder 0 is reserved for boot blocks,

partition tables, etc.

Superblock: file system parameters, includingSuperblock: file system parameters, including Size of partition (note that this is dangerously redundant) Location of root directory Block size

Cylinder groups, includingCylinder groups, including Data blocks List of inodes Bitmap of used blocks and fragments in the group Replica of superblock (not always at start of group)

Page 9: File Systems

– 9 – CS 105

FFS File TrackingFFS File TrackingDirectory: file containing variable-length recordsDirectory: file containing variable-length records

File name I-node number

Inode: holds metadata for one fileInode: holds metadata for one file Located by number, using info from superblock Integral number of inodes in a block Includes

Owner and group File type (regular, directory, pipe, symbolic link, …) Access permissions Time of last i-node change, last modification, last access Number of links (reference count) Size of file (for directories and regular files) Pointers to data blocks

Page 10: File Systems

– 10 – CS 105

FFS InodesFFS Inodes

Inode has 15 pointers to data blocksInode has 15 pointers to data blocks 12 point directly to data blocks 13th points to an indirect block, containing pointers to data blocks 14th points to a double indirect block 15th points to a triple indirect block

With 4K blocks and 4-byte pointers, triple indirect With 4K blocks and 4-byte pointers, triple indirect block can address 4 terabytes (2block can address 4 terabytes (24242 bytes) of data bytes) of data

Data blocks might not be contiguous on diskData blocks might not be contiguous on disk

But OS tries to But OS tries to clustercluster related items in cylinder group: related items in cylinder group: Directory entries Corresponding inodes Their data blocks

Page 11: File Systems

– 11 – CS 105

FFS Free-Space ManagementFFS Free-Space Management

Free space managed by bitmapsFree space managed by bitmaps One bit per block Makes it easy to find groups of contiguous blocks

Each cylinder group has own bitmapEach cylinder group has own bitmap Can find blocks that are physically nearby Prevents long scans on full disks

Prefer to allocate block in cylinder group of last Prefer to allocate block in cylinder group of last previous blockprevious block If can’t, pick group that has most space Heuristic tries to maximize number of blocks close to each

other

Page 12: File Systems

– 12 – CS 105

FFS FragmentationFFS Fragmentation

Blocks are typically 4K or 8KBlocks are typically 4K or 8K Amortizes overhead of reading or writing block On average, wastes 1/2 block (total) per file

FFS divides blocks into 4-16 FFS divides blocks into 4-16 fragmentsfragments Free space bitmap manages fragments Small files, or tails of files are placed in fragments This turned out to be terrible idea

Greatly complicates OS code Didn’t foresee how big disks would get

Linux EXT2 uses smaller block size (typically 1K) Linux EXT2 uses smaller block size (typically 1K) instead of fragmentsinstead of fragments

Page 13: File Systems

– 13 – CS 105

File Systems and Data StructuresFile Systems and Data Structures

Almost every data structure you can think of is present Almost every data structure you can think of is present in FFS:in FFS: Arrays (of blocks and i-nodes) Variable-length records (in directories) Heterogeneous records Indirection (directories and inodes) Reference counting Lists (inodes in different cylinder groups) Trees (indirect data blocks) Bitmaps (free space management) Caching

Page 14: File Systems

– 14 – CS 105

Effect of File Systemson ProgramsEffect of File Systemson ProgramsSoftware can take advantage of FFS designSoftware can take advantage of FFS design

Small files are cheap: spread data across many files Directories are cheap: use as key/value database where file

name is the key Large files well supported: don’t worry about file-size limits Random access adds little overhead: OK to store database

inside large fileBut don’t forget you’re still paying for disk latencies and

indirect blocks!

FFS design also suggests optimizationsFFS design also suggests optimizations Put related files in single directory

Example of serious violators: CVS, SVN Keep directories relatively small Recognize that single large file will eat all remaining free

space in cylinder groupCreate small files before large ones

Page 15: File Systems

– 15 – CS 105

Summary: Goals of UnixFile SystemsSummary: Goals of UnixFile SystemsSimple data modelSimple data model

Hierarchical directory tree Uninterpreted (by OS) sequences of bytes Extensions are just strings in long filename

Multiuser protection modelMultiuser protection model

High speedHigh speed Reduce disk latencies by careful layout Hide latencies with caching Amortize overhead with large transfers Sometimes trade off reliability for speed