cs 540 – chapter 12 kate dehbashi anna deghdzunyan fall 2010 dr. parviz
Post on 21-Dec-2015
217 views
TRANSCRIPT
Mass-Storage Structure
CS 540 ndash Chapter 12
Kate DehbashiAnna Deghdzunyan
Fall 2010Dr Parviz
Agenda Review
File System Parts of File System
Overview Magnetic Disks Magnetic Tapes
Disk Structure Disk Attachment
Host-attached Network-attached Storage-Area Networks
Disk Scheduling Scheduling Algorithms Selection of an algorithm
Agenda (Cont)
Disk Management Disk Formatting Boot Block Bad-Block Recovery
Swap-Space Management How is it used Where is it located How is it managed
Agenda (Cont) RAID structure RAID levels The implementation of RAID Problems with RAID
Stable Storage Implementation Disk write result Recoverable Write Failure detection and recovery
Tertiary-Storage Structure Tertiary-Storage Devices Removable disks Tapes OS support Tape Drives File Naming HSM Speed Reliability Cost
Review
File System Method of storing and organizing computer
files and their data Storage Organization Manipulation Retrieval
Maintain physical location
Review (Cont)
Parts of File System Interface
User and programmer interface to the file system Implementation
Internal data structure and algorithms used to implement the interface
Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation
Overview
Magnetic Disks Magnetic Tape
Magnetic Disks Structure
Platter Track Sector Cylinder Disk arm Read-write head
Rotates 60-200 timessecond Disk Speed
Transfer rate Positioning time
Seek time Rotational latency
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Agenda Review
File System Parts of File System
Overview Magnetic Disks Magnetic Tapes
Disk Structure Disk Attachment
Host-attached Network-attached Storage-Area Networks
Disk Scheduling Scheduling Algorithms Selection of an algorithm
Agenda (Cont)
Disk Management Disk Formatting Boot Block Bad-Block Recovery
Swap-Space Management How is it used Where is it located How is it managed
Agenda (Cont) RAID structure RAID levels The implementation of RAID Problems with RAID
Stable Storage Implementation Disk write result Recoverable Write Failure detection and recovery
Tertiary-Storage Structure Tertiary-Storage Devices Removable disks Tapes OS support Tape Drives File Naming HSM Speed Reliability Cost
Review
File System Method of storing and organizing computer
files and their data Storage Organization Manipulation Retrieval
Maintain physical location
Review (Cont)
Parts of File System Interface
User and programmer interface to the file system Implementation
Internal data structure and algorithms used to implement the interface
Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation
Overview
Magnetic Disks Magnetic Tape
Magnetic Disks Structure
Platter Track Sector Cylinder Disk arm Read-write head
Rotates 60-200 timessecond Disk Speed
Transfer rate Positioning time
Seek time Rotational latency
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Agenda (Cont)
Disk Management Disk Formatting Boot Block Bad-Block Recovery
Swap-Space Management How is it used Where is it located How is it managed
Agenda (Cont) RAID structure RAID levels The implementation of RAID Problems with RAID
Stable Storage Implementation Disk write result Recoverable Write Failure detection and recovery
Tertiary-Storage Structure Tertiary-Storage Devices Removable disks Tapes OS support Tape Drives File Naming HSM Speed Reliability Cost
Review
File System Method of storing and organizing computer
files and their data Storage Organization Manipulation Retrieval
Maintain physical location
Review (Cont)
Parts of File System Interface
User and programmer interface to the file system Implementation
Internal data structure and algorithms used to implement the interface
Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation
Overview
Magnetic Disks Magnetic Tape
Magnetic Disks Structure
Platter Track Sector Cylinder Disk arm Read-write head
Rotates 60-200 timessecond Disk Speed
Transfer rate Positioning time
Seek time Rotational latency
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Agenda (Cont) RAID structure RAID levels The implementation of RAID Problems with RAID
Stable Storage Implementation Disk write result Recoverable Write Failure detection and recovery
Tertiary-Storage Structure Tertiary-Storage Devices Removable disks Tapes OS support Tape Drives File Naming HSM Speed Reliability Cost
Review
File System Method of storing and organizing computer
files and their data Storage Organization Manipulation Retrieval
Maintain physical location
Review (Cont)
Parts of File System Interface
User and programmer interface to the file system Implementation
Internal data structure and algorithms used to implement the interface
Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation
Overview
Magnetic Disks Magnetic Tape
Magnetic Disks Structure
Platter Track Sector Cylinder Disk arm Read-write head
Rotates 60-200 timessecond Disk Speed
Transfer rate Positioning time
Seek time Rotational latency
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Review
File System Method of storing and organizing computer
files and their data Storage Organization Manipulation Retrieval
Maintain physical location
Review (Cont)
Parts of File System Interface
User and programmer interface to the file system Implementation
Internal data structure and algorithms used to implement the interface
Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation
Overview
Magnetic Disks Magnetic Tape
Magnetic Disks Structure
Platter Track Sector Cylinder Disk arm Read-write head
Rotates 60-200 timessecond Disk Speed
Transfer rate Positioning time
Seek time Rotational latency
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Review (Cont)
Parts of File System Interface
User and programmer interface to the file system Implementation
Internal data structure and algorithms used to implement the interface
Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation
Overview
Magnetic Disks Magnetic Tape
Magnetic Disks Structure
Platter Track Sector Cylinder Disk arm Read-write head
Rotates 60-200 timessecond Disk Speed
Transfer rate Positioning time
Seek time Rotational latency
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Overview
Magnetic Disks Magnetic Tape
Magnetic Disks Structure
Platter Track Sector Cylinder Disk arm Read-write head
Rotates 60-200 timessecond Disk Speed
Transfer rate Positioning time
Seek time Rotational latency
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Magnetic Disks Structure
Platter Track Sector Cylinder Disk arm Read-write head
Rotates 60-200 timessecond Disk Speed
Transfer rate Positioning time
Seek time Rotational latency
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Magnetic Disks (Cont) Head Crash
Disk head making contact with the disk surface Permanent damage
Removable Magnetic Disks Floppy
Head sits directly on the surface Slow rotation and lower disk space
IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber
Channel SCSI Fire wire Disk Controller
Cache memory Host controller
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
First HDD ndash IBM RAMAC 1956
15 square meters (16 sq ft)
$320000
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Magnetic Tape
Early secondary-storage medium First used in1951 as a computer storage
Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)
Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk
Modern Usage Backup archive
For large amount of data tape can be substantially less expensive than disk
Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
LTO-2
SDLT frac14 frac12 inch
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Structure
Addressing One-dimensional array of blocks Logical Block
Smallest unit of transfer 512 bytes
Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order
Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Structure (Cont)
Logical block number Cylinder track sector
In practice it is difficult to perform Defective sectors Sectorstrack is not constant
CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed
CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Attachment Host-Attached Storage (DAS)
Accessed through local IO ports IDE ATA SATA SCSI FC
Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape
Network-Attached Storage (NAS) NAS ISCSI
Storage-Area Network (SAN) SAN infiniBand
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Host-Attached StorageSCSI
SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Host-Attached Storage FC
FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)
All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage
devices Dominate in future Basic of SANs
Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
FC ndash Topologies
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Network-Attached Storage NAS
Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface
UNIX NFS Windows CIFS
RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached
Same ease of naming and access Less efficient and lower performance
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Network-Attached Storage (Cont)
ISCSI ndash Internet Small Computing System Interface
Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to
remote targets TCP ports 860 and 3260
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Storage-Area Networks
SAN Private network connecting servers and
storage devices Uses storage protocols instead of networking
protocols Multiple hosts and storage can attach to the
same SAN flexibility SAN Switch allowsprohibits client access
(exp) FC is the most common SAN interconnect
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Storage-Area Networks (Cont)
InfiniBand Special-purpose bus architecture Supports high-speed interconnection network
Up to 25 gbps 64000 addressable devices
Supports QoS and Failover
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling
Disk Drive Efficiency Access time
Seek time Rotational latency
Bandwidth Bytes transferred Δt
Δt Completion time of the last transfer ndash first request for service time
Improve Scheduling the servicing of IO requests in a good order
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont)
IO request procedure System Call Sent by the process to the OS System Call information
Inputoutput Disk address Memory address Number of sectors to be transferred
If disk available access
else Queue
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont)
Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont)
FCFS (First come First Served)
640 cylinder
moves
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont)
SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation
236 cylinder moves
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont)
SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track
236 cylinder moves
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont)
CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the
beginning without servicing any request
360 cylinder moves
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont)
LOOKCLOOK Head goes as far as the last request in each direction
322 cylinder moves
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont)
Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a
heavy load on the disk No starvation Scheduling alg Performance (example1)
Number of requests Types of requests
Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks
Caching directories and indexed blocks in the main memory reduces arm movement (example3)
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Scheduling (Cont) Selection of an algorithm
Separate module of the OS can be replaced if necessary
Default SSTFLOOK Rotational Delay Perspective
Modern disks do not disclose the physical location of logical blocks
Disk controller takes over OS to choose the alg Problem
If only IO OK But there are other constraints
Example request for paging (example)
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Management
Disk Formatting Low-level Logical
Boot Block Bootstrap
Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Disk Formatting
Low-Level Formatting (Physical Formatting) Header Trailer
Sector Number ECC
Error detection Soft error recovery
Data-Area 512 bytes
Logical Formatting Partition
One or more group of cylinders Each partition is treated as a separate disk (example)
Logical Formatting Storing of initial file system
Map of allocated and free space An initial empty Directory
Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters
Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Boot Block Bootstrap Program
Initial program to start a computer system Initializes aspects of the system
CPU registers Device controllers Contents of the main memory
Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec
Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Booting from a Disk in Windows 2000
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Bad-Block Recovery
Complete Disk Failure Replace the disk
Bad Sector Handling Manually
IDE format chkdsk Special entry into FAT
Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder
Sector Slipping Move down every sector tom empty the next sector to the bad sector Example
Soft-error repairable by disk controller through ECC Hard-error lost data back up
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Swap-Space Management
In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably
Virtual memory uses disk space as an extension of the main memory Performance decreases why
Swap-space management goal To get the best throughput for the
virtual memory system
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Swap-Space Management (Cont)
How is it used Depends on memory management alg
Swapping Load entire process into disk Paging Stores pages
Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds
pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Swap-Space Management (Cont)
Where is it located on disk In the normal file system
Large file within the file system File-system routines can be used Easy to implement but inefficient
Takes time to traverse the directory structure Separate disk partition
Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning
Linux Supports both Who decides
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Swap-Space Management (Cont)
How is it managed Unix
Traditional copy the entire processes Newer combination of swapping amp paging
Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if
page is forced out of the main memory
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Swapping on Linux System
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
52
RAID Structure
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Large number of disks in a system improves data transfer rate if the disks are operated in parallel
Improves reliability of data storage when redundant information is stored on multiple disks
A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability
Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but
that can be used in the event of failure of a disk to rebuild the lost information
Mirroring duplicating every disk
bull A logical disk consists of two physical disks
bull Every write is carried out on both disks
53
RAID Structure
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Improvement in Performance via Parallelism Striping
Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to
disk(i mod n)+1
Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by
load balancing Reduce the response time of large accesses
54
RAID Structure
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
55
RAID Levels
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
56
RAID Level 0 Striping Refers to disk arrays with striping at the level of
blocks but without any redundancy
RAID Level 1 Mirroring Refers to disk mirroring
RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is
even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity
of the byte changes and thus does not match the stored parity
Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data
RAID 2 is not used in practice
RAID Levels
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
57
RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on
two or more drives An additional drive stores parity information
Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks
Adv ndash N way striping of data data transfer is N times faster than RAID Level 1
Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware
RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity
block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in
parallel read-modify- write cycle
RAID Levels
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks
Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk
RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple
disk failures
2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures
Reed-Solomon codes are used as error-correcting codes
58
RAID Levels
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
RAID 0+1 ndash stripe first then mirror the stripe
RAID 1+0 ndash mirror first then stripe the mirror
If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available
With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks
59
RAID Levels
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
The Implementation of RAID
Volume-management software can implement RAID within the kernel or at the system software layer
In the host bus-adapter (HBA) hardware
In the hardware of the storage array
In the SAN interconnect layer by disk virtualization devices
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Rebuild Performance
Easiest for RAID Level 1
Long hours for RAID 5 for large disks
RAID Level 0 is used in high-performance applications where data loss is not critical
RAID Level 1 is used for applications with high reliability and fast recovery
RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases
RAID Level 5 is often preferred for storing large volumes of data over RAID 1
If more disks are in an array data-transfer rates are higher but the system is more expensive
If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater
61
Selecting a RAID Level
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Extensions
Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless
systems
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Problems with RAID11048708RAID protects against physical errors but not other hardware
and software errors
1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data
1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed
Checksumming provides error detection and correction
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Problems with RAID (cont)
21048708RAID implementations are lack of flexibility
What if we have five disk RAID level 5 set and file system is too large to fit on it
Partitions of disks are gathered together via RAID sets into pools of storage
So No artificial limits on storage
use and no need to relocate file systems
between volumes or resize volumes
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
65
Stable-Storage Implementation
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Information residing in stable storage is never lost
To implement stable storage Replicate information on multiple storage
devices with independent failure modes
Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery
66
Overview
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Successful Completion Partial Failure failure occurred in the
midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted
Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact
67
Disk Write Result
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure
Recoverable write Write the information onto the first physical
block
When the first write completes successfully write the same information onto the second physical block
Declare the operation complete only after the second write completes successfully
68
Recoverable write
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Each pair of physical blocks is examined If both are the same and no detectable error
exists OK If one contains a detectable error then we
replace its contents with the value of the other block
If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second
Ensure that a write to stable storage either succeeds completely or results in no change
69
Failure detection and Recovery
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
70
Tertiary-Storage Structure
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Low cost is the defining characteristic of tertiary storage
Generally tertiary storage is built using removable media
Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc
71
Tertiary-Storage Devices
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case
Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB
Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure
Magneto-optic disk -- records data on a rigid platter coated with magnetic material
The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes
The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written
Laser light is also used to read data (Kerr effect)
72
Removable Disks
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an
amorphous state Uses laser light at three different powers low- read data medium-
erase the disk high-write to the disk Common examples CD-RW and DVD-RW
WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through
the aluminum information can be destroyed but not altered
Removable Disks
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Compared to a disk a tape is less expensive and holds more data but random access is much slower
Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data
Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library
A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use
74
Tapes
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications
Application Interface A new cartridge is formatted and an empty file system is
generated on the disk Tapes are presented as a raw storage medium ie and
application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that
application until the application closes the tape device Since the OS does not provide file system services the application
must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize
a tape a tape full of data can generally only be used by the program that created it
75
Operating-System Support
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
The basic operations for a tape drive differ from those of a disk drive
Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))
The read position( ) operation returns the logical block number where the tape head is located
The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a
block in the middle of the tape also effectively erases everything after that block
An EOT mark is placed after a block that is written
76
Tape Drives
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer
Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data
Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way
77
File Naming
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed
Small and frequently used files remain on disk
Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data
78
Hierarchical Storage Management
(HSM)
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Two aspects of speed in tertiary storage are bandwidth and latency
Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate
during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing
Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching
Driversquos overall data rate
79
Speed
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the
selected cylinder and wait for the rotational latency lt 5 milliseconds
Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds
Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives
80
Speed (cont)
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
A fixed disk drive is likely to be more reliable than a removable disk or tape drive
An optical cartridge is likely to be more reliable than a magnetic disk or tape
A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed
81
Reliability
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive
The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years
Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives
82
Cost
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
83
Price per megabyte of DRAM from 1981 to 2000
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
84
Price per megabyte of magnetic hard disk from 1981
to 2000
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
85
Price per megabyte of a tape drive from 1984 to 2000
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
References
Silberschatz A Operating Systems Concepts 8th edition
Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
Questions
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-
88
Thank you
- Mass-Storage Structure
- Agenda
- Agenda (Cont)
- Slide 4
- Review
- Review (Cont)
- Overview
- Magnetic Disks
- Magnetic Disks (Cont)
- Slide 10
- Slide 11
- Slide 12
- Magnetic Tape
- Slide 14
- Disk Structure
- Disk Structure (Cont)
- Disk Attachment
- Host-Attached Storage SCSI
- Slide 19
- Host-Attached Storage FC
- Slide 21
- Network-Attached Storage
- Slide 23
- Slide 24
- Network-Attached Storage (Cont)
- Storage-Area Networks
- Slide 27
- Slide 28
- Storage-Area Networks (Cont)
- Slide 30
- Slide 31
- Disk Scheduling
- Disk Scheduling (Cont)
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Disk Management
- Disk Formatting
- Boot Block
- Slide 45
- Bad-Block Recovery
- Swap-Space Management
- Swap-Space Management (Cont)
- Slide 49
- Slide 50
- Slide 51
- RAID Structure
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- The Implementation of RAID
- Slide 61
- Extensions
- Problems with RAID
- Problems with RAID (cont)
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Slide 80
- Slide 81
- Slide 82
- Slide 83
- Slide 84
- Slide 85
- References
- Slide 87
- Slide 88
-