chapter sixteen data recovery and fault tolerance

Chapter Sixteen

Data Recovery and Fault Tolerance

Objectives

• Just what is it about that hard disk that lets you recover erased data?– How is data stored on the hard drive?– What happens when you delete it?

• What can different utilities do to help you?• What kind of disaster prevention/recovery schemes are available?• What is a good backup strategy?• What options are there for keeping things going when equipment

fails?

Disk Structure and Data Recovery

• Data geometry is based on cylinders, heads, and the numbers of sectors per track (CHS).

• A BIOS routine called Int13h controls disk access.

• Different methods for extending Int13h (the extensions) allow for larger drives.

Disk Structure

Cylinders

• The minimum data unit on a drive is a sector.– Sectors are 512 bytes.– Hard disks read data in groups of multiple sectors called

file allocation units (FAU) or sometimes clusters.• Hard disks have multiple platters.• Sectors are laid out in concentric rings on the hard disk

platter called tracks.• All the tracks that form a stack from platter to platter are

called a cylinder.

Heads

• Platters generally store data on the top and bottom surfaces.

• Each surface requires an independent read/write head.

• The number of heads tells you how many readable surfaces the disk contains.

Sectors Per Track

• Int13h was taught to believe that there would always be 63 sectors per track.– In reality, there are far more than that on hard

disk drives.– Int13h extensions have methods of interpreting

the actual numbers as groups of 63.

Int13h and CHS

• Cylinder address = 10 bits (1,024 cylinders)

• Head numbers = 8 bits (256 heads)

• Sectors per track = 6 (64 sectors)– Sector 0, track 0 on all cylinders is reserved for

the master boot record, leaving 63 available.

• 1,024 x 256 x 63 x 512 (bytes per sector) equals 8,455,716,864 bytes, or 8GB

Int13h Extensions

• To make drives larger than 8GB, new algorithms were written to intercept data calls and report back to Int13h.

• Extended CHS (ECHS or Large) – The number of heads is cut and the address bits given to cylinders.– The number of available cylinders is greatly exaggerated.

• Logical Block Addressing (LBA)– It uses a 28-bit address space.– Each sector is given a number from 0 to whatever.– LBA translates physical sector addresses into logical addresses

Int13h can understand.

Disk Structure

• The master boot record

• Partitions

• The file allocation tables

Contents of the Master Boot Record

• The MBR contains executable code that introduces the file system.

• Next are the partition tables that map the sectors used by each partition.

• An OS pointer directs the system to the first line of code used to launch the OS.

• Immediately following the MBR are the file allocation tables.

Partitions

• The number and type of partitions required are dictated by the OS.– Windows can be installed on a single partition.– Linux and Unix generally create multiple

partitions.

• The maximum size of partitions and the size of clusters used are a function of OS.

Partition Generation

• Partitions are created by a utility specific to the OS.– FDISK for WIN9x and earlier– Disk Manager for WIN2K and later– Various utilities exist for Linux and Unix

• Third-party utilities exist that allow you to create partitions after the OS is installed without damaging the installation.

File Mapping Systems

• File Allocation Tables (FAT) used by all FAT file systems

• Master File Table (MFT) used by NTFS• File Descriptor Table used by Unix

– Whatever it is called, the file mapping tables keep track of what files are stored, what files are open, and where everything belongs.

– Mapping tables directly follow the MBR and precede the first formatted partition.

Disk Utilities

• “Disk Doctors” and disk editors

• Unformatting utilities

• File recovery utilities

File Recovery

• When a file is deleted, the data doesn’t go anywhere.– The file tables are simply rewritten.

• A good file recovery utility rewrites the file table entry and the data is restored.

• To permanently delete a file, use a disk wiping utility that writes 0s over the file sectors.

Backup and Recovery

• Backup hardware

• Backup software

• Backup/recovery strategy

• Backup type

• Backup rotation

Backup Hardware

• Tape drives– DAT– DLT– AIC– QIC

• CD-RW or DVD-RW

Backup Software

• Nearly every OS comes with a rudimentary backup utility.– For many people, this is sufficient.– For more sophisticated requirements, look into

third-party solutions.

The Backup/Recovery Strategy

• What kind of hardware?

• What kind of software?

• Who is responsible for backups?

• What backup type do you use?

• What is your backup rotation?

Backup Types

• Full

• Incremental

• Differential

• Daily

• File copy

Backup Rotations

• Nightly versus weekly (versus whenever you happen to think about it)

• When to use full and when to use differential or incremental

• The Grandfather Method– A different set of tapes for each week

Off-site Backup Options

• Off-site storage of tapes made locally

• Backups made directly to an off-site backup/ recovery agent– Internet options– Internal WAN options

Fault Tolerance

• Systems can be designed so that defective parts can be replaced on the fly (hot swappable).– Hard disks– PCI– Memory (requires OS that supports the function)

Hot Spares and Cold Spares

• A hot spare is running and connected all the time.– If a part fails, the hot spare automatically takes

over.• A cold spare is installed in the system, but not

connected. – If a part fails, the administrator can have it

hooked up and running in short order.

Hot Sites, Warm Sites, and Cold Sites

• Hot sites consist of a duplicate server farm with all data constantly replicated from the network to the hot site.

• Warm sites have a completely configured server farm to which data can be copied with short notice.

• Cold sites have all the equipment on site, but in the event of failure it must be configured and the data restored.

RAID

• Redundant Array of Independent Disks– Two or more drives are seen by the system as

one drive.– Different levels of RAID have different levels of

fault tolerance.

RAID 0

• Disk striping without parity

• Data is written in small pieces to two or more drives.

• Performance is greatly enhanced, but there is no fault tolerance at all.

RAID 1

• Disk mirroring or disk duplexing– Identical copies of the data are stored on two

disks.– Mirroring uses a single controller; duplexing puts

the second drive on an independent controller.– If one drive fails, the other takes over.

RAID 5

• Disk striping with parity– Data is stored on three or more drives.– Recovery data called parity is equally divided

over all drives.– If one drive fails, the parity from the other drives

fills in the blank.– Storage space equal to one drive (or the partition

size used) must be reserved for parity.

RAID 1+0

• A RAID 0 array is configured to enhance system performance.

• An identical RAID 0 array is duplexed on a separate controller.

• If a drive fails on the primary RAID array, the second array automatically takes over.

• It requires third-party software to work.

RAID 50

• A RAID 5 array is configured on one controller.

• A duplicate RAID 5 array is configured on a second controller.

• Four drives must fail before the system goes down completely.

• It requires third-party software and special controllers to function.

chapter sixteen data recovery and fault tolerance

Documents