disks and files vivek pai princeton university. 2 gedankyou imagine the following: a disk scheduling...
Post on 21-Dec-2015
220 views
TRANSCRIPT
2
Gedankyou
Imagine the following: A disk scheduling policy says “handle the
request that is closest to where the disk head currently is”
On a system with lots of disk-intensive jobs, what problem can arise?
What tweaks can avoid this problem?
3
Why Files
Physical reality Block oriented Physical sector #s No protection
among users of the system
Data might be corrupted if machine crashes
Filesystem model Byte oriented Named files Users protected
from each other Robust to machine
failures
4
File Structures
Byte sequence Read or write a number of bytes Unstructured or linear
Record sequence Fixed or variable length Read or write a number of records
Tree Records with keys Read, insert, delete a record (typically using B-tree)
5
File Structures Today
Stream of bytes Simplest to implement in kernel Easy to manipulate in other forms Little performance loss
More complicated structures Hardware assist fell out of favor Special-purpose hardware slower, costly
6
File Types
ASCII – plain text A Unix executable file
header: magic number, sizes, entry point, flags Text (code) Data relocation bits symbol table
Devices Everything else in the system
7
So What Makes Filesystems Hard?
Files grow and shrink in pieces
Little a priori knowledge 6 orders of magnitude in
file sizes Overcoming disk
performance behavior Desire for efficiency Coping with failure
8
File System Components
Disk management Arrange collection of disk blocks
into files Naming
User gives file name, not track or sector number, to locate data
Security Keep information secure
Reliability/durability When system crashes, lose stuff in
memory, but want files to be durable
User
FileNaming
Fileaccess
Diskmanagement
Diskdrivers
9
Some Definitions
File descriptor (fd) – an integer used to represent a file – easier than using names
Metadata – data about data - bookkeeping data used to eventually access the “real” data
Open file table – system-wide list of descriptors in use
10
Kinds of Metadata
inode – index node, or a specific set of information kept about each file Two forms – on disk and in memory
Directory – names and location information for files and subdirectories Note: stored in files in Unix
Superblock – contains information to describe the file system, disk layout
Information about free blocks/inodes on disk
11
Contents of an Inode
Disk inode: File type, size, blocks on disk Owner, group, permissions (r/w/x) Reference count Times: creation, last access, last mod Inode generation number Padding & other stuff
128 bytes on classic Unix
12
Directories in Unix
Stored like regular files Contents are file names and inode #s Names are nul-terminated strings
Logic Separates file from location in tree File can appear in multiple places
What are the drawbacks?
13
Effects of Corruption
inode – file gets “damaged” Maybe some “free” block gets viewed
Directory – “lose” files/directories Might get to read deleted files
Superblock – can’t figure out anything This is why we replicate the superblock
14
Data Structures for A Typical File System
Processcontrolblock
...
Openfile
pointerarray
Open filetable
(systemwide)Memory Inode
Diskinode
15
Opening A File
File name lookup and authentication
Copy the file metadata into the in-memory data structure, if it is not in yet
Create an entry in the open file table (system wide) if there isn’t one
Create an entry in PCB Link up the data structures Return a pointer to user
PCB
fd = open( FileName, access)
Openfile
table
Metadata
Allocate & link updata structures
File name lookup& authenticate
File system on disk
16
Reading And Writing
What happens when you… read 10 bytes from a file? write 10 bytes into an existing file? write 4096 bytes into a file?
Disk works on blocks (sectors) Can have temporary (ephemeral) buffers Longer lasting buffers = disk cache
17
Reading A Block
PCB
Openfile
table
Metadata
read( fd, userBuf, size )
Logical phyiscal
read( device, phyBlock, size )
Get physical block to sysBufcopy to userBuf
Disk device driver
Buffercache
18
A Disk Layout for A File System
Superblock defines a file system size of the file system size of the file descriptor area free list pointer, or pointer to bitmap location of the file descriptor of the root directory other meta-data such as permission and various times
For reliability, replicate the superblock
Superblock
File metadata(i-node in Unix)
File data blocksBootblock
19
File Usage Patterns
How do users access files? Sequential: bytes read in order Random: read/write element out of middle of arrays Whole file or partial file
How are files used? Most files are small Large files use up most of the disk space Large files account for most of the bytes transferred
Bad news Need everything to be efficient
20
Data Structures for Disk Management
A “header” for each file (part of the file meta-data) Disk sectors associated with each file
A data structure to represent free space on disk Bit map
1 bit per block (sector) blocks numbered in cylinder-major order, why?
Linked list Others?
How much space does a bit map need for a 4G disk?
21
Linked Files (Alto)
File header points to 1st block on disk
Each block points to next Pros
Can grow files dynamically Free list is similar to a file
Cons random access: horrible unreliable: losing a block
means losing the rest
File header
null
. . .
22
Contiguous Allocation
Request in advance for the size of the file Search bit map or linked list to locate a space File header
first sector in file number of sectors
Pros Fast sequential access Easy random access
Cons External fragmentation Hard to grow files
23
Single-Level Indexed Files orExtent-based Filesystems A user declares max size A file header holds an array
of pointers to point to disk blocks
Pros Can grow up to a limit Random access is fast
Cons Clumsy to grow beyond limit Periodic cleanup of new files Up-front declaration a real pain
File headerDiskblocks
24
217
File Allocation Table (FAT) Approach
A section of disk for each partition is reserved
One entry for each block A file is a linked list of
blocks A directory entry points to
the 1st block of the file Pros
Simple Cons
Always go to FAT Wasting space
619
399
foo 217
EOF
FAT
0
399
619
25
Multi-Level Indexed Files (Unix)
13 Pointers in a header 10 direct pointers 11: 1-level indirect 12: 2-level indirect 13: 3-level indirect
Pros & Cons In favor of small files Can grow Limit is 16G and lots of
seek What happens to reach
block 23, 5, 340?
1 2
data
data
...11 12 13
data...
...
data...
...
data...
...
26
Reliability In Disk Systems
Make sure certain actions have occurred before function completes Known as “synchronous” operation Ex: make sure new inode is on disk & that the
directory has been modified before declaring a file creation is complete
Drawback: speed Some ops easily asynchronous: access time Some filesystems don’t care: Linux ext2fs
27
Recovery After Failure
Need to ensure consistency Does free bitmap match tree walk? Do reference counts in inodes match directory
entries? Do blocks appear in multiple inodes?
This kind of recovery grows with disk size Clean shutdown – mark as such, no recovery
28
Reducing Synchronous Times
Write to a faster storage Nonvolatile memory – expensive, requires some
additional OS/firmware support Write to a special disk or section – logging
Only have to examine log when recovering Eventually have to put information in place Some information dies in the log itself
Write in a special order Write metadata in a way that is consistent but
possibly recovers less
29
Challenges
Unix filesystem has great flexibility Extent-based filesystems have speed Seeks kill performance – locality Bitmaps show contiguous free space Linked lists easy to search How do you perform backup/restore?
30
A Quick XOR Overview
XOR = eXclusive OR a XOR a = 0 a XOR 0 = a a XOR b = b XOR a (a XOR b) XOR c = a XOR (b XOR c) In other words, count the bits,
even = 0, odd = 1
31
More Fun With XOR
Result = XOR (a1, a2, a3, a4,…) a2 goes bad Can we reconstruct a2?
a2 = XOR (a1, result, a3, a4,…) What does this imply for disks?
What kinds of failures does it handle?
32
Bigger, Faster, Stronger
Making individual disks larger is hard Throw more disks at the problem
Capacity increases Effective access speed may increase Probability of failure also increases
Use some disks to provide redundancy Generally assume a fail-stop model Fail-stop versus Byzantine failures
33
RAID (Redundant Array of Inexpensive Disks)
Main idea Store the error correcting codes
on other disks General error correcting codes
are too powerful Use XORs or single parity Upon any failure, one can
recover the entire block from the spare disk (or any disk) using XORs
Pros Reliability High bandwidth
Cons The controller is complex
RAID controller
XOR
34
Synopsis of RAID Levels
RAID Level 0: Non redundant (JBOD)
RAID Level 1:Mirroring
RAID Level 2:Byte-interleaved, ECC
RAID Level 3:Byte-interleaved, parity
RAID Level 4:Block-interleaved, parity
RAID Level 5:Block-interleaved, distributed parity
35
Did RAID Work?
Performance: yes Reliability: yes Cost: no
Controller design complicated Fewer economies of scale High-reliability environments don’t care
Now also software implementations
36
RAID’s Real Benefit
Partly addresses the failure problem Backup/restore less of an issue Failed disk “rebuilt” at sector level Lower performance during rebuild, but system
still on-line Still not perfect
Geographic problems Failure during rebuild