cs 540 – chapter 12 kate dehbashi anna deghdzunyan fall 2010 dr. parviz

88
Mass-Storage Structure CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Post on 21-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Mass-Storage Structure

CS 540 ndash Chapter 12

Kate DehbashiAnna Deghdzunyan

Fall 2010Dr Parviz

Agenda Review

File System Parts of File System

Overview Magnetic Disks Magnetic Tapes

Disk Structure Disk Attachment

Host-attached Network-attached Storage-Area Networks

Disk Scheduling Scheduling Algorithms Selection of an algorithm

Agenda (Cont)

Disk Management Disk Formatting Boot Block Bad-Block Recovery

Swap-Space Management How is it used Where is it located How is it managed

Agenda (Cont) RAID structure RAID levels The implementation of RAID Problems with RAID

Stable Storage Implementation Disk write result Recoverable Write Failure detection and recovery

Tertiary-Storage Structure Tertiary-Storage Devices Removable disks Tapes OS support Tape Drives File Naming HSM Speed Reliability Cost

Review

File System Method of storing and organizing computer

files and their data Storage Organization Manipulation Retrieval

Maintain physical location

Review (Cont)

Parts of File System Interface

User and programmer interface to the file system Implementation

Internal data structure and algorithms used to implement the interface

Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation

Overview

Magnetic Disks Magnetic Tape

Magnetic Disks Structure

Platter Track Sector Cylinder Disk arm Read-write head

Rotates 60-200 timessecond Disk Speed

Transfer rate Positioning time

Seek time Rotational latency

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 2: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Agenda Review

File System Parts of File System

Overview Magnetic Disks Magnetic Tapes

Disk Structure Disk Attachment

Host-attached Network-attached Storage-Area Networks

Disk Scheduling Scheduling Algorithms Selection of an algorithm

Agenda (Cont)

Disk Management Disk Formatting Boot Block Bad-Block Recovery

Swap-Space Management How is it used Where is it located How is it managed

Agenda (Cont) RAID structure RAID levels The implementation of RAID Problems with RAID

Stable Storage Implementation Disk write result Recoverable Write Failure detection and recovery

Tertiary-Storage Structure Tertiary-Storage Devices Removable disks Tapes OS support Tape Drives File Naming HSM Speed Reliability Cost

Review

File System Method of storing and organizing computer

files and their data Storage Organization Manipulation Retrieval

Maintain physical location

Review (Cont)

Parts of File System Interface

User and programmer interface to the file system Implementation

Internal data structure and algorithms used to implement the interface

Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation

Overview

Magnetic Disks Magnetic Tape

Magnetic Disks Structure

Platter Track Sector Cylinder Disk arm Read-write head

Rotates 60-200 timessecond Disk Speed

Transfer rate Positioning time

Seek time Rotational latency

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 3: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Agenda (Cont)

Disk Management Disk Formatting Boot Block Bad-Block Recovery

Swap-Space Management How is it used Where is it located How is it managed

Agenda (Cont) RAID structure RAID levels The implementation of RAID Problems with RAID

Stable Storage Implementation Disk write result Recoverable Write Failure detection and recovery

Tertiary-Storage Structure Tertiary-Storage Devices Removable disks Tapes OS support Tape Drives File Naming HSM Speed Reliability Cost

Review

File System Method of storing and organizing computer

files and their data Storage Organization Manipulation Retrieval

Maintain physical location

Review (Cont)

Parts of File System Interface

User and programmer interface to the file system Implementation

Internal data structure and algorithms used to implement the interface

Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation

Overview

Magnetic Disks Magnetic Tape

Magnetic Disks Structure

Platter Track Sector Cylinder Disk arm Read-write head

Rotates 60-200 timessecond Disk Speed

Transfer rate Positioning time

Seek time Rotational latency

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 4: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Agenda (Cont) RAID structure RAID levels The implementation of RAID Problems with RAID

Stable Storage Implementation Disk write result Recoverable Write Failure detection and recovery

Tertiary-Storage Structure Tertiary-Storage Devices Removable disks Tapes OS support Tape Drives File Naming HSM Speed Reliability Cost

Review

File System Method of storing and organizing computer

files and their data Storage Organization Manipulation Retrieval

Maintain physical location

Review (Cont)

Parts of File System Interface

User and programmer interface to the file system Implementation

Internal data structure and algorithms used to implement the interface

Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation

Overview

Magnetic Disks Magnetic Tape

Magnetic Disks Structure

Platter Track Sector Cylinder Disk arm Read-write head

Rotates 60-200 timessecond Disk Speed

Transfer rate Positioning time

Seek time Rotational latency

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 5: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Review

File System Method of storing and organizing computer

files and their data Storage Organization Manipulation Retrieval

Maintain physical location

Review (Cont)

Parts of File System Interface

User and programmer interface to the file system Implementation

Internal data structure and algorithms used to implement the interface

Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation

Overview

Magnetic Disks Magnetic Tape

Magnetic Disks Structure

Platter Track Sector Cylinder Disk arm Read-write head

Rotates 60-200 timessecond Disk Speed

Transfer rate Positioning time

Seek time Rotational latency

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 6: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Review (Cont)

Parts of File System Interface

User and programmer interface to the file system Implementation

Internal data structure and algorithms used to implement the interface

Storage Structure Physical structure Disk scheduling algorithms Disk formatting Disk reliability Stable-storage implementation

Overview

Magnetic Disks Magnetic Tape

Magnetic Disks Structure

Platter Track Sector Cylinder Disk arm Read-write head

Rotates 60-200 timessecond Disk Speed

Transfer rate Positioning time

Seek time Rotational latency

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 7: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Overview

Magnetic Disks Magnetic Tape

Magnetic Disks Structure

Platter Track Sector Cylinder Disk arm Read-write head

Rotates 60-200 timessecond Disk Speed

Transfer rate Positioning time

Seek time Rotational latency

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 8: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Magnetic Disks Structure

Platter Track Sector Cylinder Disk arm Read-write head

Rotates 60-200 timessecond Disk Speed

Transfer rate Positioning time

Seek time Rotational latency

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 9: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Magnetic Disks (Cont) Head Crash

Disk head making contact with the disk surface Permanent damage

Removable Magnetic Disks Floppy

Head sits directly on the surface Slow rotation and lower disk space

IO bus Drive attached to computer via set of wires Busses vary including EIDE ATA SATA USB Fiber

Channel SCSI Fire wire Disk Controller

Cache memory Host controller

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 10: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

First HDD ndash IBM RAMAC 1956

15 square meters (16 sq ft)

$320000

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 11: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Magnetic Tape

Early secondary-storage medium First used in1951 as a computer storage

Holds large quantities of data LTO-5 (2010) 15 TB uncompressed data (book 20-200GB)

Access time slow Random access ~1000 times slower than disk Once data under head transfer rates comparable to disk

Modern Usage Backup archive

For large amount of data tape can be substantially less expensive than disk

Common technologies 4mm 8mm 19mm LTO (Linear Tape-Open) SDLT (Digital Linear Tape)

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 12: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

LTO-2

SDLT frac14 frac12 inch

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 13: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Structure

Addressing One-dimensional array of blocks Logical Block

Smallest unit of transfer 512 bytes

Blocks maps to sectors sequentially Sector 0 first sector first track outmost cylinder Mapping order

Track Rest of the tracks in the same cylinder Rest of the cylinders from outermost to innermost

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 14: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Structure (Cont)

Logical block number Cylinder track sector

In practice it is difficult to perform Defective sectors Sectorstrack is not constant

CLV (Constant Linear Velocity) Constant density of bitstrack Variable rotational speed

CAV (Constant Angular Velocity) Constant rotational speed Variable density of bits per track

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 15: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Attachment Host-Attached Storage (DAS)

Accessed through local IO ports IDE ATA SATA SCSI FC

Wide variety of storage devices HDD RAID Arrays CDDVD Drives and Tape

Network-Attached Storage (NAS) NAS ISCSI

Storage-Area Network (SAN) SAN infiniBand

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 16: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Host-Attached StorageSCSI

SCSI (Small Computer System Interface) Large variety of devices 16 devices per cable Controller card (SCSI Initiator) SCSI target 8 logical units per target

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 17: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Host-Attached Storage FC

FC (Fiber Channel) High-speed serial architecture Optical cable four-conductor copper cable Switched fabric (FC - SW)

All devices are connected to fiber channel switches 24-bit address space multiple hosts and storage

devices Dominate in future Basic of SANs

Arbitrated Loop (FC ndash AL) 126 devices All devices are in a loop or ring Historically lower cost but rarely used now

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 18: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

FC ndash Topologies

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 19: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Network-Attached Storage NAS

Storage system Accessed remotely over a data network Clients access via remote-procedure-call interface

UNIX NFS Windows CIFS

RPCs carried via TCPUDP Convenient way for all clients to share a pool of storage NAS VS local-attached

Same ease of naming and access Less efficient and lower performance

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 20: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Network-Attached Storage (Cont)

ISCSI ndash Internet Small Computing System Interface

Latest NAT protocol IP-based storage networking protocol Uses IP network to carry SCSI Protocol Clients are able to send SCSI commands to

remote targets TCP ports 860 and 3260

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 21: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Storage-Area Networks

SAN Private network connecting servers and

storage devices Uses storage protocols instead of networking

protocols Multiple hosts and storage can attach to the

same SAN flexibility SAN Switch allowsprohibits client access

(exp) FC is the most common SAN interconnect

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 22: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Storage-Area Networks (Cont)

InfiniBand Special-purpose bus architecture Supports high-speed interconnection network

Up to 25 gbps 64000 addressable devices

Supports QoS and Failover

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 23: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling

Disk Drive Efficiency Access time

Seek time Rotational latency

Bandwidth Bytes transferred Δt

Δt Completion time of the last transfer ndash first request for service time

Improve Scheduling the servicing of IO requests in a good order

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 24: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont)

IO request procedure System Call Sent by the process to the OS System Call information

Inputoutput Disk address Memory address Number of sectors to be transferred

If disk available access

else Queue

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 25: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont)

Algorithms FCFS SSTF SCAN C-SCAN LOOKCLOOK

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 26: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont)

FCFS (First come First Served)

640 cylinder

moves

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 27: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont)

SSTF (Shortest Seek Time First) Service requests close to the current head position Starvation

236 cylinder moves

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 28: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont)

SCAN (Elevator Alg) Head starts at one end and goes to the other end Services each request on the current track

236 cylinder moves

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 29: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont)

CSCAN Variant of SCAN More uniform wait time When head reaches the end immediately moves to the

beginning without servicing any request

360 cylinder moves

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 30: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont)

LOOKCLOOK Head goes as far as the last request in each direction

322 cylinder moves

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 31: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont)

Selection of an algorithm Factors SSTF is common and better performance than FCFS SCAN CSCAN perform better for systems that place a

heavy load on the disk No starvation Scheduling alg Performance (example1)

Number of requests Types of requests

Requests for disk service can be influenced by The file-allocation method (example2) Location of directories and indexed blocks

Caching directories and indexed blocks in the main memory reduces arm movement (example3)

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 32: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Scheduling (Cont) Selection of an algorithm

Separate module of the OS can be replaced if necessary

Default SSTFLOOK Rotational Delay Perspective

Modern disks do not disclose the physical location of logical blocks

Disk controller takes over OS to choose the alg Problem

If only IO OK But there are other constraints

Example request for paging (example)

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 33: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Management

Disk Formatting Low-level Logical

Boot Block Bootstrap

Bad-Block Recovery Manually Sector Sparing (Forwarding) Sector Slipping

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 34: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Disk Formatting

Low-Level Formatting (Physical Formatting) Header Trailer

Sector Number ECC

Error detection Soft error recovery

Data-Area 512 bytes

Logical Formatting Partition

One or more group of cylinders Each partition is treated as a separate disk (example)

Logical Formatting Storing of initial file system

Map of allocated and free space An initial empty Directory

Cluster Blocks are put together to increase efficiency Disk IO done via blocksFile IO done via clusters

Raw disk Some programs use the disk partition as a large sequential array of logic blocks bypassing the file system services

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 35: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Boot Block Bootstrap Program

Initial program to start a computer system Initializes aspects of the system

CPU registers Device controllers Contents of the main memory

Starts the OS Finds the OS Kernel on disk and loads it into the memory Jumps to an initial address to begin the OS exec

Stored in ROM No need for initialization No virus Problem Hard to update solution save the bootstrap loader in the ROM full bootstrap on boot blocks (fixed location on HDD)

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 36: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Booting from a Disk in Windows 2000

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 37: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Bad-Block Recovery

Complete Disk Failure Replace the disk

Bad Sector Handling Manually

IDE format chkdsk Special entry into FAT

Sector Sparing (Forwarding) SCSI Controller maintains a bad sector list List is initialized during the low-level formatting Controller sets aside spare sectors to replace bad sectors logically (Example) Problem Invalidate optimization done by disk scheduling alg Solution Spare sectors on each cylinder

Sector Slipping Move down every sector tom empty the next sector to the bad sector Example

Soft-error repairable by disk controller through ECC Hard-error lost data back up

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 38: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Swap-Space Management

In modern Operating Systems ldquoPagingrdquo and ldquoSwappingrdquo are used interchangeably

Virtual memory uses disk space as an extension of the main memory Performance decreases why

Swap-space management goal To get the best throughput for the

virtual memory system

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 39: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Swap-Space Management (Cont)

How is it used Depends on memory management alg

Swapping Load entire process into disk Paging Stores pages

Amount of swap space needed depends on Amount of physical memory Amount of virtual memory Way virtual memory is used Ranges from few MB to GB Better to overestimate why No process is aborted Solaris swap space = amount by which VM exceeds

pageable physical memory Linux swap space = double the amount of physical memory Multiple swap spaces

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 40: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Swap-Space Management (Cont)

Where is it located on disk In the normal file system

Large file within the file system File-system routines can be used Easy to implement but inefficient

Takes time to traverse the directory structure Separate disk partition

Raw partition Swap space storage manager Uses alg Optimized for speed rather than storage efficiency why Trade-off between speed and fragmentation acceptable (data life is short) Fixed amount of space is set aside during partitioning Adding more space requires re-partitioning

Linux Supports both Who decides

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 41: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Swap-Space Management (Cont)

How is it managed Unix

Traditional copy the entire processes Newer combination of swapping amp paging

Solaris1 File-system text-segment pages containing code Swap-space pages of anonymous memory such as stack or heap Modern versions only allocate the swap space if

page is forced out of the main memory

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 42: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Swapping on Linux System

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 43: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

52

RAID Structure

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 44: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Large number of disks in a system improves data transfer rate if the disks are operated in parallel

Improves reliability of data storage when redundant information is stored on multiple disks

A variety of disk organization techniques collectively called redundant array of independent disks (RAID) can be used to improve performance and reliability

Improvement of Reliability via Redundancy Redundancy Storing extra information that is not normally needed but

that can be used in the event of failure of a disk to rebuild the lost information

Mirroring duplicating every disk

bull A logical disk consists of two physical disks

bull Every write is carried out on both disks

53

RAID Structure

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 45: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Improvement in Performance via Parallelism Striping

Splitting the bits of each byte across multiple disks Bit ndashlevel striping write bit i of each byte to disk i Block-level striping with n disks block i of a file goes to

disk(i mod n)+1

Parallelism in a disk system as achieved through striping has two main goals Increase the throughput of multiple small accesses by

load balancing Reduce the response time of large accesses

54

RAID Structure

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 46: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

55

RAID Levels

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 47: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

56

RAID Level 0 Striping Refers to disk arrays with striping at the level of

blocks but without any redundancy

RAID Level 1 Mirroring Refers to disk mirroring

RAID Level 2 Memory- style error correcting code Parity bit the number of bits in the byte set to1 is

even (parity=0) or odd (parity =1) If one of the bits in the byte is damaged the parity

of the byte changes and thus does not match the stored parity

Error correction bits are stored in disks labeled P These bits will be used to reconstruct the damaged data

RAID 2 is not used in practice

RAID Levels

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 48: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

57

RAID Level 3 bit- interleaved parity organization Data are subdivided (striped) and written in parallel on

two or more drives An additional drive stores parity information

Adv - Storage overhead is reduced because only one parity disk is needed for several regular disks

Adv ndash N way striping of data data transfer is N times faster than RAID Level 1

Problem - Expense of computing and writing parity Hardware controller with dedicated parity hardware

RAID Level 4 block- interleaved parity organization Uses block level striping and in addition keeps a parity

block on separate disk High transfer rates for large reads and writes Small independent writes cannot be performed in

parallel read-modify- write cycle

RAID Levels

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 49: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

RAID Level 5 Block interleaved distributed parity Parity blocks are interleaved and distributed on all disks

Hence parity blocks no longer reside on same disk avoids potential overuse of a single parity disk

RAID Level 6 P+Q redundancy scheme Stores extra redundant information to not have multiple

disk failures

2 bits of redundant data are stored for every 4 bits of data and the system can tolerate two disk failures

Reed-Solomon codes are used as error-correcting codes

58

RAID Levels

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 50: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

RAID 0+1 ndash stripe first then mirror the stripe

RAID 1+0 ndash mirror first then stripe the mirror

If a single disk fails in RAID 0+1 an entire stripe is inaccessible leaving only the other stripe available

With a failure in RAID 1+0 a single disk in unavailable but the disk that mirrors it is still available as are all the rest of the disks

59

RAID Levels

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 51: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

The Implementation of RAID

Volume-management software can implement RAID within the kernel or at the system software layer

In the host bus-adapter (HBA) hardware

In the hardware of the storage array

In the SAN interconnect layer by disk virtualization devices

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 52: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Rebuild Performance

Easiest for RAID Level 1

Long hours for RAID 5 for large disks

RAID Level 0 is used in high-performance applications where data loss is not critical

RAID Level 1 is used for applications with high reliability and fast recovery

RAID 0+1 and RAID 1+0 are used where both performance and reliability are important ndash for example small databases

RAID Level 5 is often preferred for storing large volumes of data over RAID 1

If more disks are in an array data-transfer rates are higher but the system is more expensive

If more bits are protected by a parity the space overhead due to parity is lower but the chance of second disk failure is greater

61

Selecting a RAID Level

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 53: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Extensions

Concepts of RAID have been generalized to other storage devices Arrays of tapes The broadcast of data over wireless

systems

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 54: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Problems with RAID11048708RAID protects against physical errors but not other hardware

and software errors

1048708Solaris ZFS (Zettabyte File System) file system uses checksums which verifies integrity of data

1048708Checksums kept with pointer to object to detect if object is the right one and whether it changed

Checksumming provides error detection and correction

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 55: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Problems with RAID (cont)

21048708RAID implementations are lack of flexibility

What if we have five disk RAID level 5 set and file system is too large to fit on it

Partitions of disks are gathered together via RAID sets into pools of storage

So No artificial limits on storage

use and no need to relocate file systems

between volumes or resize volumes

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 56: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

65

Stable-Storage Implementation

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 57: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Information residing in stable storage is never lost

To implement stable storage Replicate information on multiple storage

devices with independent failure modes

Coordinate the writing of updates in a way that guarantees that a failure during an update will not leave all the copies in a damaged state to ensure that we can recover the stable data after any failure during data transfer or recovery

66

Overview

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 58: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Successful Completion Partial Failure failure occurred in the

midst of transfer so some of the sectors were written with the new data and the sector being written during the failure may have been corrupted

Total Failure failure occurred before the disk write started so the previous data value on the disk remain intact

67

Disk Write Result

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 59: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

System must maintain (at least) 2 physical blocks for each logical block for detecting and recovering from failure

Recoverable write Write the information onto the first physical

block

When the first write completes successfully write the same information onto the second physical block

Declare the operation complete only after the second write completes successfully

68

Recoverable write

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 60: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Each pair of physical blocks is examined If both are the same and no detectable error

exists OK If one contains a detectable error then we

replace its contents with the value of the other block

If both contain no detectable error but they differ in content then we replace the content of the first block with the value of the second

Ensure that a write to stable storage either succeeds completely or results in no change

69

Failure detection and Recovery

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 61: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

70

Tertiary-Storage Structure

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 62: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Low cost is the defining characteristic of tertiary storage

Generally tertiary storage is built using removable media

Common examples of removable media are floppy disks tapes read-only write-once and rewritable CDs and DVDs etc

71

Tertiary-Storage Devices

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 63: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Floppy disk mdash thin flexible disk coated with magnetic material enclosed in a protective plastic case

Most floppies hold about 1 MB similar technology is used for removable disks that hold more than 1 GB

Removable magnetic disks can be nearly as fast as hard disks but they are at a greater risk of damage from exposure

Magneto-optic disk -- records data on a rigid platter coated with magnetic material

The magneto-optic head flies much farther from the disk surface than a magnetic disk head and the magnetic material is covered with a protective layer of plastic or glass resistant to head crashes

The head flashes a laser beam at the disk surface and is aimed at a tiny spot where the bit is to be written

Laser light is also used to read data (Kerr effect)

72

Removable Disks

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 64: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Optical disks ndash employ special materials that are altered by laser light Example is the phase-change disk coated with material that can freeze into either a crystalline or an

amorphous state Uses laser light at three different powers low- read data medium-

erase the disk high-write to the disk Common examples CD-RW and DVD-RW

WORM (ldquoWrite Once Read Many Timesrdquo)- disks can be written only once Thin aluminum film sandwiched between two glass or plastic platters To write a bit the drive uses laser light to burn a small hole through

the aluminum information can be destroyed but not altered

Removable Disks

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 65: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Compared to a disk a tape is less expensive and holds more data but random access is much slower

Tape is an economical medium for purposes that do not require fast random access eg backup copies of disk data holding huge volumes of data

Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library

A disk-resident file can be transferred to tape which is low cost storage If it needed the computer can stage it back into disk storage for active use

74

Tapes

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 66: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications

Application Interface A new cartridge is formatted and an empty file system is

generated on the disk Tapes are presented as a raw storage medium ie and

application does not open a file on the tape it opens the whole tape drive as a raw device Usually the tape drive is reserved for the exclusive use of that

application until the application closes the tape device Since the OS does not provide file system services the application

must decide how to use the array of tape blocks Since every application makes up its own rules for how to organize

a tape a tape full of data can generally only be used by the program that created it

75

Operating-System Support

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 67: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

The basic operations for a tape drive differ from those of a disk drive

Locate( ) positions the tape to a specific logical block not an entire track (corresponds to seek( ))

The read position( ) operation returns the logical block number where the tape head is located

The space( ) operation enables relative motion Tape drives are ldquoappend-onlyrdquo devices updating a

block in the middle of the tape also effectively erases everything after that block

An EOT mark is placed after a block that is written

76

Tape Drives

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 68: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer and then use the cartridge in another computer

Contemporary OSs generally leave the name-space problem unsolved for removable media and depend on applications and users to figure out how to access and interpret the data

Some kinds of removable media (eg CDs) are so well standardized that all computers use them the same way

77

File Naming

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 69: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

HSM is a data storage technique which automatically moves data between high-cost and low-cost storage media HSM systems exist because high-speed storage devices( hard disk drive arrays) are more expensive (per byte stored) than slower devices (optical discs magnetic tape drives It would be great to have all data available on high-speed devices all the time but this is expensive Instead HSM systems store the data on slower devices and then copy data to faster disk drives when needed

Small and frequently used files remain on disk

Large old inactive files are archived to the jukebox( enables the computer to change the removable cartridge in a tape or disk drive without human assistance)

HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data

78

Hierarchical Storage Management

(HSM)

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 70: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Two aspects of speed in tertiary storage are bandwidth and latency

Bandwidth is measured in bytes per second Sustained bandwidth ndash average data rate

during a large transfer number of bytestransfer timeData rate when the data stream is actually flowing

Effective bandwidth ndash average over the entire IO time including seek or locate and cartridge switching

Driversquos overall data rate

79

Speed

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 71: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Access latency ndash amount of time needed to locate data Access time for a disk ndash move the arm to the

selected cylinder and wait for the rotational latency lt 5 milliseconds

Access on tape requires winding the tape reels until the selected block reaches the tape head tens or hundreds of seconds

Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk

The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives

80

Speed (cont)

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 72: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

A fixed disk drive is likely to be more reliable than a removable disk or tape drive

An optical cartridge is likely to be more reliable than a magnetic disk or tape

A head crash in a fixed hard disk generally destroys the data in magnetic disk whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed

81

Reliability

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 73: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Main memory is much more expensive than disk storage

The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive

The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years

Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives

82

Cost

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 74: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

83

Price per megabyte of DRAM from 1981 to 2000

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 75: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

84

Price per megabyte of magnetic hard disk from 1981

to 2000

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 76: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

85

Price per megabyte of a tape drive from 1984 to 2000

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 77: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

References

Silberschatz A Operating Systems Concepts 8th edition

Wikipediacom PCTechGuidecom USRoboticscom allSANcom Xenoncomau

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 78: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

Questions

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88
Page 79: CS 540 – Chapter 12 Kate Dehbashi Anna Deghdzunyan Fall 2010 Dr. Parviz

88

Thank you

  • Mass-Storage Structure
  • Agenda
  • Agenda (Cont)
  • Slide 4
  • Review
  • Review (Cont)
  • Overview
  • Magnetic Disks
  • Magnetic Disks (Cont)
  • Slide 10
  • Slide 11
  • Slide 12
  • Magnetic Tape
  • Slide 14
  • Disk Structure
  • Disk Structure (Cont)
  • Disk Attachment
  • Host-Attached Storage SCSI
  • Slide 19
  • Host-Attached Storage FC
  • Slide 21
  • Network-Attached Storage
  • Slide 23
  • Slide 24
  • Network-Attached Storage (Cont)
  • Storage-Area Networks
  • Slide 27
  • Slide 28
  • Storage-Area Networks (Cont)
  • Slide 30
  • Slide 31
  • Disk Scheduling
  • Disk Scheduling (Cont)
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Disk Management
  • Disk Formatting
  • Boot Block
  • Slide 45
  • Bad-Block Recovery
  • Swap-Space Management
  • Swap-Space Management (Cont)
  • Slide 49
  • Slide 50
  • Slide 51
  • RAID Structure
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • The Implementation of RAID
  • Slide 61
  • Extensions
  • Problems with RAID
  • Problems with RAID (cont)
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • References
  • Slide 87
  • Slide 88