file system implementations presented by: gaurav gupta department of csee university of maryland...

22
File System File System Implementations Implementations Presented by: Presented by: Gaurav Gupta Gaurav Gupta Department of CSEE Department of CSEE University Of Maryland Baltimore University Of Maryland Baltimore County County

Upload: araceli-doring

Post on 31-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

File System File System ImplementationsImplementations

Presented by:Presented by:

Gaurav GuptaGaurav Gupta

Department of CSEEDepartment of CSEE

University Of Maryland Baltimore CountyUniversity Of Maryland Baltimore County

Page 2: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

IntroductionIntroduction

Local File systems & Remote File SystemsLocal File systems & Remote File Systems Two general File systems in modern UnixTwo general File systems in modern Unix

• System V File system, originalSystem V File system, original• Berkeley fast file system, 4.2 BSD, Berkeley fast file system, 4.2 BSD,

betterbetter Vnode/Vfs interface supports multiple file Vnode/Vfs interface supports multiple file

systems.systems. This chapter summarizes and compares This chapter summarizes and compares

the two file systemsthe two file systems

Page 3: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

System V File System (s5fs)System V File System (s5fs)

Single logical disk or partition, one FS per Single logical disk or partition, one FS per partitionpartition

Each FS has own root, sub directories, Each FS has own root, sub directories, Files, data and metadataFiles, data and metadata

Disk Block = 512 * n, granularity of disk Disk Block = 512 * n, granularity of disk allocation for a fileallocation for a file

Translated by disk drivers in to track Translated by disk drivers in to track sectors and cylinderssectors and cylinders

B S Inode list Data blocks

Boot area superblock

Page 4: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Layout:Layout:• Boot area: Bootstrap codeBoot area: Bootstrap code• Superblock: Attributes and metadata of Superblock: Attributes and metadata of

file systemfile system• Inode list: one inode/file 64 bytes, fix the Inode list: one inode/file 64 bytes, fix the

size of file systemsize of file system• Data area: files, directories and indirect Data area: files, directories and indirect

blocks which hold pointers to other file blocks which hold pointers to other file data blocksdata blocks

Directories:Directories:• File containing list of files and File containing list of files and

subdirectoriessubdirectories• Fixed record of 16 bytesFixed record of 16 bytes

Page 5: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

• 2 bytes ( 22 bytes ( 21616= 65535 files) inode = 65535 files) inode number, 14 bytes file name number, 14 bytes file name

• 0 inode number means file no longer 0 inode number means file no longer existexist

• Root directory and parent have inode Root directory and parent have inode number equal to 2 number equal to 2

7373 ..

3838 ....

99 File1File1

00 Deleted fileDeleted file

110110 Subdirectory1Subdirectory1

6565 File2File2

Page 6: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Inodes ( Index nodes):Inodes ( Index nodes):• Each file has one unique inodeEach file has one unique inode• Inode contains metadata of the fileInode contains metadata of the file• on-disk inode and in-core inodeon-disk inode and in-core inode

FieldField SizeSize DescriptionDescription

di_modedi_mode 22 File type, permissionsFile type, permissions

di_uiddi_uid 22 Owner UIDOwner UID

di_giddi_gid 22 Owner GIDOwner GID

di_sizedi_size 44 Size in bytesSize in bytes

di_addrdi_addr 3939 Array of block addressesArray of block addresses

:: :: ::

Page 7: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

di_addr:di_addr:• File is not stored in contiguous blocks, File is not stored in contiguous blocks,

prevents fragmentationprevents fragmentation• An array of block address is required, An array of block address is required,

Stored in inode, prevent extra readStored in inode, prevent extra read• Size of array depends on the size of fileSize of array depends on the size of file

012345

76

89

101112

indirectDouble indirect

triple indirect

Page 8: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

SuperblockSuperblock• Metadata about File systemMetadata about File system• One Superblock per File systemOne Superblock per File system• Kernel reads Superblock when mounting Kernel reads Superblock when mounting

the File systemthe File system• Superblock contains following Superblock contains following

informationinformation Size in blocks of the file systemSize in blocks of the file system Size in blocks of the inode listSize in blocks of the inode list Number of free blocks and inodesNumber of free blocks and inodes Free block list (Partial)Free block list (Partial) Free inode list (Full)Free inode list (Full)

Page 9: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Kernel OrganizationKernel Organization

In-Core InodesIn-Core Inodes• Represented by Represented by struct inodestruct inode• All fields of on-disk inode and following extra fieldsAll fields of on-disk inode and following extra fields

vnodevnode: contains the vnode of the file: contains the vnode of the file Device IDDevice ID of the partition containing the file of the partition containing the file InodeInode numbernumber of the file of the file FlagsFlags for synchronization and cache for synchronization and cache

managementmanagement Pointers to keep the inode on a Pointers to keep the inode on a free listfree list Pointers to keep the inode on a Pointers to keep the inode on a hash queuehash queue Block number of Block number of last block readlast block read

Page 10: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Inode LookupInode Lookup• Lookuppn(),Lookuppn(), a file system independent a file system independent

function performs pathname parsingfunction performs pathname parsing• When searching s5fs directory it translates When searching s5fs directory it translates

to a call to to a call to s5lookup()s5lookup()• s5lookup first checks directory name s5lookup first checks directory name

lookup cachelookup cache• On miss it reads the directory one block at On miss it reads the directory one block at

a timea time• If directory contains a valid filename entry, If directory contains a valid filename entry,

s5lookup() obtains inode number of files5lookup() obtains inode number of file• iget()iget() is called to locate inode is called to locate inode• iget() searches the appropriate hash table iget() searches the appropriate hash table

to get the inodeto get the inode

Page 11: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

File I/OFile I/O• Read and write system call accept Read and write system call accept

File descriptor, user buffer address, count of number File descriptor, user buffer address, count of number of byte transferredof byte transferred

• Offset if obtained from the open file objectOffset if obtained from the open file object• Offset is advanced to the number of byte Offset is advanced to the number of byte

transferredtransferred• For random I/O “lseek” is used to set the offset For random I/O “lseek” is used to set the offset

to desired locationto desired location• Kernel verifies the file mode and puts an Kernel verifies the file mode and puts an

exclusive lock on the inode for serialized exclusive lock on the inode for serialized accessaccess

• II n read s5read() translate the starting n read s5read() translate the starting offset to logical block number in the fileoffset to logical block number in the file

Page 12: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Allocating and Reclaiming InodesAllocating and Reclaiming Inodes• Inode remains active as long as vnode has a Inode remains active as long as vnode has a

non-zero reference count non-zero reference count • New implementations puts the inactive inode New implementations puts the inactive inode

on free liston free list• Inode caching uses LRU replacement algorithm Inode caching uses LRU replacement algorithm

( suboptimal)( suboptimal)• When file is actively used, inode is pinned When file is actively used, inode is pinned

( ineligible for freeing)( ineligible for freeing) When file becomes inactive some pages When file becomes inactive some pages

may still be in the memorymay still be in the memory• Inode is free only when no pages are present in Inode is free only when no pages are present in

the memorythe memory• New inodes are allocated from the top of the New inodes are allocated from the top of the

free listfree list

Page 13: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

AnalysisAnalysis• Simple designSimple design• Single superblock can be corruptedSingle superblock can be corrupted• Grouping of inode in the beginning Grouping of inode in the beginning

requires long seek time between inode requires long seek time between inode read and file accessread and file access

• Fixed lock size wastes spaceFixed lock size wastes space• Filename is limited to 14 charactersFilename is limited to 14 characters• Number of inodes are limited to 65535Number of inodes are limited to 65535

Page 14: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

The Berkeley Fast File SystemThe Berkeley Fast File System

Improves performance, reliability and Improves performance, reliability and functionalityfunctionality

Provides all functionality of s5fs, Provides all functionality of s5fs, system call handling algorithms and system call handling algorithms and kernel data structureskernel data structures

Difference in disk layout, on disk Difference in disk layout, on disk structures and free block allocation structures and free block allocation methodsmethods

Page 15: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Data layout on hard diskData layout on hard disk

track0

platters

track2track1

head 0

head 1

head 2

Cylinder 0

Cylinder 1

Sector 0Sector 1

Page 16: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Sector size is 512 bytesSector size is 512 bytes Unix view of disk is linear array of Unix view of disk is linear array of

blocksblocks Number of sectors/block = 2Number of sectors/block = 2nn, n is , n is

small numbersmall number Device driver translates block Device driver translates block

number to logical sector number and number to logical sector number and the physical track, head and sector the physical track, head and sector numbernumber

Each cylinder contains a sequential Each cylinder contains a sequential set of block numbersset of block numbers

Head seek time, rotation latencyHead seek time, rotation latency

Page 17: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

On disk organizationOn disk organization• Disk partition comprises of set of Disk partition comprises of set of

consecutive cylinders on diskconsecutive cylinders on disk• FFS further divides the partition into one FFS further divides the partition into one

or more cylinder groups (consecutive or more cylinder groups (consecutive cylinders)cylinders)

• Traditional superblock is divided into two Traditional superblock is divided into two structuresstructures

• FFS superblock contains information like FFS superblock contains information like number, size and location of cylinder number, size and location of cylinder group, block size, inodes etc.group, block size, inodes etc.

• Superblock does not change unless file Superblock does not change unless file system is rebuiltsystem is rebuilt

• Every cylinder group has information Every cylinder group has information about the group including free inodes, about the group including free inodes, free block lists etcfree block lists etc

• Each group has a copy of superblockEach group has a copy of superblock

Page 18: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Blocks and fragmentsBlocks and fragments• Advantage & disadvantage of block sizeAdvantage & disadvantage of block size• FFS divides blocks in to fragmentsFFS divides blocks in to fragments• Block size is 2Block size is 2nn , min = 4096, much , min = 4096, much

larger then s5fs ( 512/1024 bytes)larger then s5fs ( 512/1024 bytes)• Small size files fragments are usefulSmall size files fragments are useful• Lower bound of fragments = 512 bytesLower bound of fragments = 512 bytes• File has complete disk blocks except lastFile has complete disk blocks except last• First block should be a single block not First block should be a single block not

set of fragmentsset of fragments• Occasional recopying of data incase the Occasional recopying of data incase the

file grows in sizefile grows in size• FFS controls this by allowing only direct FFS controls this by allowing only direct

block to contain fragmentsblock to contain fragments

Page 19: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

Allocation PoliciesAllocation Policies• In s5fs free inode and block list is In s5fs free inode and block list is

random except at the file system random except at the file system creation timecreation time

• FFS aim to collocate related information FFS aim to collocate related information on the disk to optimize sequential on the disk to optimize sequential accessaccess

• FFS places inodes of all the files of a FFS places inodes of all the files of a single directory into same cylinder single directory into same cylinder group (improves commands like ls –l )group (improves commands like ls –l )

• Create new directory in a different Create new directory in a different cylinder group from the parent ( for cylinder group from the parent ( for uniform distribution)uniform distribution)

Page 20: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

• Place data blocks of file in the same cylinder Place data blocks of file in the same cylinder group as inodesgroup as inodes

• Change cylinder group when the file reaches Change cylinder group when the file reaches 48KB size and again at every MB48KB size and again at every MB

• Allocate sequential blocks of a file at Allocate sequential blocks of a file at rotationally optimal positionsrotationally optimal positions

FFS Functional enhancementsFFS Functional enhancements• Long file names- 255 characters and variable Long file names- 255 characters and variable

directory entry lengthdirectory entry length• Symbolic links- Symbolic link is a file that Symbolic links- Symbolic link is a file that

points to another file. points to another file. typetype field of the inode identifies the file as field of the inode identifies the file as

symbolic linksymbolic link

Page 21: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County

|Analysis|Analysis• Read throughput increases from 29KB/s Read throughput increases from 29KB/s

in s5fs to 221 KB/s in FFSin s5fs to 221 KB/s in FFS• CPU utilization increases from 11% to CPU utilization increases from 11% to

43%43%• Write throughput increases from 48KB/s Write throughput increases from 48KB/s

to 142 KB/sto 142 KB/s• Average wastage in data block is half Average wastage in data block is half

block per file in s5fs and half fragment block per file in s5fs and half fragment per file in FFSper file in FFS

Same when fragment size equals block sizeSame when fragment size equals block size Overhead to maintain fragmentsOverhead to maintain fragments

Page 22: File System Implementations Presented by: Gaurav Gupta Department of CSEE University Of Maryland Baltimore County