2p13

2P13Week 11

A+ Guide to Managing and Maintaining your PC, 6e*RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped Disk Array without Fault Tolerance: Provides data striping (spreading out blocks of each file across multiple disk drives) but no redundancy. This improves performance but does not deliver fault tolerance. If one drive fails then all data in the array is lost. Level 1 -- Mirroring and Duplexing: Provides disk mirroring. Level 1 provides twice the read transaction rate of single disks and the same write transaction rate as single disks. Level 2 -- Error-Correcting Coding: Not a typical implementation and rarely used, Level 2 stripes data at the bit level rather than the block level.

A+ Guide to Managing and Maintaining your PC, 6e*Level 4 -- Dedicated Parity Drive: A commonly used implementation of RAID, Level 4 provides block-level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks. Level 5 -- Block Interleaved Distributed Parity: Provides data striping at the byte level and also stripe error correction information. This results in excellent performance and good fault tolerance. Level 5 is one of the most popular implementations of RAID. Level 6 -- Independent Data Disks with Double Parity: Provides block-level striping with parity data distributed across all disks. Level 0+1 A Mirror of Stripes: Not one of the original RAID levels, two RAID 0 stripes are created, and a RAID 1 mirror is created over them. Used for both replicating and sharing data among disks. Level 10 A Stripe of Mirrors: Not one of the original RAID levels, multiple RAID 1 mirrors are created, and a RAID 0 stripe is created over these. Level 7: A trademark of Storage Computer Corporation that adds caching to Levels 3 or 4. RAID S: EMC Corporation's proprietary striped parity RAID system used in its Symmetrix storage systems.

A+ Guide to Managing and Maintaining your PC, 6e*RAID 0 & 1 http://www.adtron.com/expertise/activeraid.html

A+ Guide to Managing and Maintaining your PC, 6e*RAID 5RAID 5 ensures that if one of the disks in the striped set fails, its contents can be extracted using the information on the remaining functioning disks.

A+ Guide to Managing and Maintaining your PC, 6e*RAID 10Striping + Mirrors, improves performance and give redundancy.

Table 11.4 RAID LevelsN = number of data disks; m proportional to log N

CategoryLevelDescriptionDisks requiredData availabilityLarge I/O data transfer capacitySmall I/O request rateStriping0NonredundantNLower than single diskVery highVery high for both read and writeMirroring1Mirrored2NHigher than RAID 2, 3, 4, or 5; lower than RAID 6Higher than single disk for read; similar to single disk for writeUp to twice that of a single disk for read; similar to single disk for writeParallel access2Redundant via Hamming codeN + mMuch higher than single disk; comparable to RAID 3, 4, or 5Highest of all listed alternativesApproximately twice that of a single disk3Bit-interleaved parityN + 1Much higher than single disk; comparable to RAID 2, 4, or 5Highest of all listed alternativesApproximately twice that of a single diskIndependent access4Block-interleaved parityN + 1Much higher than single disk; comparable to RAID 2, 3, or 5Similar to RAID 0 for read; significantly lower than single disk for writeSimilar to RAID 0 for read; significantly lower than single disk for write5Block-interleaved distributed parityN + 1Much higher than single disk; comparable to RAID 2, 3, or 4Similar to RAID 0 for read; lower than single disk for writeSimilar to RAID 0 for read; generally lower than single disk for write6Block-interleaved dual distributed parityN + 2Highest of all listed alternativesSimilar to RAID 0 for read; lower than RAID 5 for writeSimilar to RAID 0 for read; significantly lower than RAID 5 for write

FilesData collections created by usersThe File System is one of the most important parts of the OS to a userDesirable properties of files:

File StructureProvide a means to store data organized as files as well as a collection of functions that can be performed on filesMaintain a set of attributes associated with the fileTypical operations include:CreateDeleteOpenCloseReadWrite

Structure TermsField

basic element of datacontains a single valuefixed or variable length

Filecollection of related fields that can be treated as a unit by some application programfixed or variable lengthRecordDatabasecollection of similar recordstreated as a single entitymay be referenced by nameaccess control restrictions usually apply at the file levelcollection of related datarelationships among elements of data are explicit designed for use by a number of different applicationsconsists of one or more types of files

File Management System ObjectivesMeet the data management needs of the userGuarantee that the data in the file are validOptimize performanceProvide I/O support for a variety of storage device typesMinimize the potential for lost or destroyed dataProvide a standardized set of I/O interface routines to user processesProvide I/O support for multiple users in the case of multiple-user systems

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline LevelClick to edit Master text styles

The PileLeast complicated form of file organizationData are collected in the order they arriveEach record consists of one burst of dataPurpose is simply to accumulate the mass of data and save itRecord access is by exhaustive search


The Sequential FileMost common form of file structureA fixed format is used for recordsKey field uniquely identifies the recordTypically used in batch applicationsOnly organization that is easily stored on tape as well as disk


Indexed Sequential FileAdds an index to the file to support random accessAdds an overflow fileGreatly reduces the time required to access a single recordMultiple levels of indexing can be used to provide greater efficiency in access


Indexed FileRecords are accessed only through their indexes Variable-length records can be employedExhaustive index contains one entry for every record in the main filePartial index contains entries to records where the field of interest existsUsed mostly in applications where timeliness of information is criticalExamples would be airline reservation systems and inventory control systems

Direct or Hashed FileAccess directly any block of a known addressMakes use of hashing on the key valueOften used where:very rapid access is requiredfixed-length records are usedrecords are always accessed one at a time

B-TreesA balanced tree structure with all branches of equal lengthStandard method of organizing indexes for databasesCommonly used in OS file systemsProvides for efficient searching, adding, and deleting of items


B-Tree Characteristics


B-Tree CharacteristicsA B-tree is characterized by its minimum degree d and satisfies the following properties:

every node has at most 2d 1 keys and 2d children or, equivalently, 2d pointersevery node, except for the root, has at least d 1 keys and d pointers, as a result, each internal node, except the root, is at least half full and has at least d childrenthe root has at least 1 key and 2 childrenall leaves appear on the same level and contain no information. This is a logical construct to terminate the tree; the actual implementation may differ. a nonleaf node with k pointers contains k 1 keys


Inserting Nodes Into a B-Tree

The End

From the users point of view, one of the most important parts of an operatingsystem is the file system. The file system provides the resource abstractions typicallyassociated with secondary storage. The file system permits users to create datacollections, called files, with desirable properties, such as:

Long-term existence: Files are stored on disk or other secondary storage anddo not disappear when a user logs off.

Sharable between processes: Files have names and can have associated accesspermissions that permit controlled sharing.

Structure: Depending on the file system, a file can have an internal structurethat is convenient for particular applications. In addition, files can be organizedinto hierarchical or more complex structure to reflect the relationshipsamong files.*Four terms are in common use when discussing files:

Field Record File Database

*A field is the basic element of data. An individual field contains a single value,such as an employees last name, a date, or the value of a sensor reading. It is characterizedby its length and data type (e.g., ASCII string, decimal). Depending on thefile design, fields may be fixed length or variable length. In the latter case, the fieldoften consists of two or three subfields: the actual value to be stored, the name ofthe field, and, in some cases, the length of the field. In other cases of variable-lengthfields, the length of the field is indicated by the use of special demarcation symbolsbetween fields.

A record is a collection of related fields that can be treated as a unit by someapplication program. For example, an employee record would contain such fieldsas name, social security number, job classification, date of hire, and so on. Again,depending on design, records may be of fixed length or variable length. A recordwill be of variable length if some of its fields are of variable length or if the numberof fields may vary. In the latter case, each field is usually accompanied by a fieldname. In either case, the entire record usually includes a length field.

A file is a collection of similar records. The file is treated as a single entity byusers and applications and may be referenced by name. Files have file names andmay be created and deleted. Access control restrictions usually apply at the filelevel. That is, in a shared system, users and programs are granted or denied accessto entire files. In some more sophisticated systems, such controls are enforced at therecord or even the field level.

Some file systems are structured only in terms of fields, not records. In thatcase, a file is a collection of fields.

A database is a collection of related data. The essential aspects of a databaseare that the relationships that exist among elements of data are explicit and that thedatabase is designed for use by a number of different applications. A database maycontain all of the information related to an organization or project, such as a businessor a scientific study. The database itself consists of one or more types of files. Usually,there is a separate database management system that is independent of the operatingsystem, although that system may make use of some file management programs.

*A file management system is that set of system software that provides services tousers and applications in the use of files. Typically, the only way that a user or applicationmay access files is through the file management system. This relieves the useror programmer of the necessity of developing special-purpose software for eachapplication and provides the system with a consistent, well-defined means of controllingits most important asset. [GROS86] suggests the following objectives for afile management system:

To meet the data management needs and requirements of the user, which includestorage of data and the ability to perform the aforementioned operations

To guarantee, to the extent possible, that the data in the file are valid

To optimize performance, both from the system point of view in terms ofoverall throughput and from the users point of view in terms of response time

To provide I/O support for a variety of storage device types

To minimize or eliminate the potential for lost or destroyed data

To provide a standardized set of I/O interface routines to user processes

To provide I/O support for multiple users, in the case of multiple-user systems

*The least-complicated form of file organization may be termed the pile . Data arecollected in the order in which they arrive. Each record consists of one burst ofdata. The purpose of the pile is simply to accumulate the mass of data and save it.Records may have different fields, or similar fields in different orders. Thus, eachfield should be self-describing, including a field name as well as a value. The lengthof each field must be implicitly indicated by delimiters, explicitly included as a subfield,or known as default for that field type.

Because there is no structure to the pile file, record access is by exhaustivesearch. That is, if we wish to find a record that contains a particular field with aparticular value, it is necessary to examine each record in the pile until the desiredrecord is found or the entire file has been searched. If we wish to find all recordsthat contain a particular field or contain that field with a particular value, then theentire file must be searched.

Pile files are encountered when data are collected and stored prior to processingor when data are not easy to organize. This type of file uses space well when thestored data vary in size and structure, is perfectly adequate for exhaustive searches,and is easy to update. However, beyond these limited uses, this type of file is unsuitablefor most applications.*The most common form of file structure is the sequential file. In this type of file,a fixed format is used for records. All records are of the same length, consisting ofthe same number of fixed-length fields in a particular order. Because the length andposition of each field are known, only the values of fields need to be stored; the fieldname and length for each field are attributes of the file structure.

One particular field, usually the first field in each record, is referred to as thekey field . The key field uniquely identifies the record; thus key values for differentrecords are always different. Further, the records are stored in key sequence: alphabeticalorder for a text key, and numerical order for a numerical key.

Sequential files are typically used in batch applications and are generallyoptimum for such applications if they involve the processing of all the records (e.g.,a billing or payroll application). The sequential file organization is the only one thatis easily stored on tape as well as disk.

For interactive applications that involve queries and/or updates of individualrecords, the sequential file provides poor performance. Access requires the sequentialsearch of the file for a key match. If the entire file, or a large portion of thefile, can be brought into main memory at one time, more efficient search techniquesare possible. Nevertheless, considerable processing and delay are encountered toaccess a record in a large sequential file. Additions to the file also present problems.Typically, a sequential file is stored in simple sequential ordering of the records withinblocks. That is, the physical organization of the file on tape or disk directly matchesthe logical organization of the file. In this case, the usual procedure is to place newrecords in a separate pile file, called a log file or transaction file. Periodically, a batchupdate is performed that merges the log file with the master file to produce a new filein correct key sequence.

An alternative is to organize the sequential file physically as a linked list. One

or more records are stored in each physical block. Each block on disk contains apointer to the next block. The insertion of new records involves pointer manipulationbut does not require that the new records occupy a particular physical blockposition. Thus, some added convenience is obtained at the cost of additionalprocessing and overhead.*A popular approach to overcoming the disadvantages of the sequential file is theindexed sequential file. The indexed sequential file maintains the key characteristicof the sequential file: Records are organized in sequence based on a key field. Twofeatures are added: an index to the file to support random access, and an overflowfile. The index provides a lookup capability to reach quickly the vicinity of a desiredrecord. The overflow file is similar to the log file used with a sequential file but isintegrated so that a record in the overflow file is located by following a pointer fromits predecessor record.

In the simplest indexed sequential structure, a single level of indexing isused. The index in this case is a simple sequential file. Each record in the index fileconsists of two fields: a key field, which is the same as the key field in the main file,and a pointer into the main file. To find a specific field, the index is searched to findthe highest key value that is equal to or precedes the desired key value. The searchcontinues in the main file at the location indicated by the pointer.

To see the effectiveness of this approach, consider a sequential file with1 million records. To search for a particular key value will require on average one-halfmillion record accesses. Now suppose that an index containing 1,000 entriesis constructed, with the keys in the index more or less evenly distributed overthe main file. Now it will take on average 500 accesses to the index file followedby 500 accesses to the main file to find the record. The average search length isreduced from 500,000 to 1,000.

Additions to the file are handled in the following manner: Each record inthe main file contains an additional field not visible to the application, which is apointer to the overflow file. When a new record is to be inserted into the file, it isadded to the overflow file. The record in the main file that immediately precedesthe new record in logical sequence is updated to contain a pointer to the new recordin the overflow file. If the immediately preceding record is itself in the overflow file,then the pointer in that record is updated. As with the sequential file, the indexedsequential file is occasionally merged with the overflow file in batch mode.

The indexed sequential file greatly reduces the time required to access a singlerecord, without sacrificing the sequential nature of the file. To process the entire filesequentially, the records of the main file are processed in sequence until a pointerto the overflow file is found, then accessing continues in the overflow file until a nullpointer is encountered, at which time accessing of the main file is resumed where itleft off.

To provide even greater efficiency in access, multiple levels of indexing can beused. Thus the lowest level of index file is treated as a sequential file and a higher levelindex file is created for that file. Consider again a file with 1 million records.A lower-level index with 10,000 entries is constructed. A higher-level index intothe lower-level index of 100 entries can then be constructed. The search begins atthe higher-level index (average length = 50 accesses) to find an entry point into thelower-level index. This index is then searched (average length = 50) to find an entrypoint into the main file, which is then searched (average length = 50). Thus the averagelength of search has been reduced from 500,000 to 1,000 to 150.*The indexed sequential file retains one limitation of the sequential file: Effectiveprocessing is limited to that which is based on a single field of the file. For example,when it is necessary to search for a record on the basis of some other attribute thanthe key field, both forms of sequential file are inadequate. In some applications, theflexibility of efficiently searching by various attributes is desirable.

To achieve this flexibility, a structure is needed that employs multiple indexes,one for each type of field that may be the subject of a search. In the general indexedfile, the concept of sequentially and a single key are abandoned. Records areaccessed only through their indexes. The result is that there is now no restrictionon the placement of records as long as a pointer in at least one index refers to thatrecord. Furthermore, variable-length records can be employed.

Two types of indexes are used. An exhaustive index contains one entry forevery record in the main file. The index itself is organized as a sequential file forease of searching. A partial index contains entries to records where the field ofinterest exists. With variable-length records, some records will not contain all fields.When a new record is added to the main file, all of the index files must be updated.

Indexed files are used mostly in applications where timeliness of informationis critical and where data are rarely processed exhaustively. Examples are airlinereservation systems and inventory control systems.*The direct, or hashed, file exploits the capability found on disks to access directly anyblock of a known address. As with sequential and indexed sequential files, a key fieldis required in each record. However, there is no concept of sequential ordering here.

The direct file makes use of hashing on the key value. This function is explainedin Appendix F . Figure F.1b shows the type of hashing organization with an overflowfile that is typically used in a hash file.

Direct files are often used where very rapid access is required, where fixed lengthrecords are used, and where records are always accessed one at a time.Examples are directories, pricing tables, schedules, and name lists.*The preceding section referred to the use of an index file to access individual recordsin a file or database. For a large file or database, a single sequential file of indexes onthe primary key does not provide for rapid access. To provide more efficient access,a structured index file is typically used. The simplest such structure is a two-levelorganization in which the original file is broken into sections and the upper levelconsists of a sequenced set of pointers to the lower-level sections. This structure canthen be extended to more than two levels, resulting in a tree structure. Unless somediscipline is imposed on the construction of the tree index, it is likely to end up withan uneven structure, with some short branches and some long branches, so that thetime to search the index is uneven. Therefore, a balanced tree structure, with allbranches of equal length, would appear to give the best average performance. Such astructure is the B-tree, which has become the standard method of organizing indexesfor databases and is commonly used in OS file systems, including those supported byMac OS X, Windows, and several Linux file systems. The B-tree structure providesfor efficient searching, adding, and deleting of items.

*Before illustrating the concept of B-tree, let us define a B-tree and its characteristicsmore precisely. A B-tree is a tree structure (no closed loops) with thefollowing characteristics: ( Figure 12.4 )

The tree consists of a number of nodes and leaves.

2. Each node contains at least one key which uniquely identifies a file record,and more than one pointer to child nodes or leaves. The number of keys andpointers contained in a node may vary, within limits explained below.

3. Each node is limited to the same number of maximum keys.

4. The keys in a node are stored in nondecreasing order. Each key has anassociated child that is the root of a subtree containing all nodes with keysless than or equal to the key but greater than the preceding key. A node alsohas an additional rightmost child that is the root for a subtree containingall keys greater than any keys in the node. Thus, each node has one morepointer than keys.*Before illustrating the concept of B-tree, let us define a B-tree and its characteristicsmore precisely. A B-tree is a tree structure (no closed loops) with thefollowing characteristics: ( Figure 12.4 )

The tree consists of a number of nodes and leaves.

2. Each node contains at least one key which uniquely identifies a file record,and more than one pointer to child nodes or leaves. The number of keys andpointers contained in a node may vary, within limits explained below.

3. Each node is limited to the same number of maximum keys.

4. The keys in a node are stored in nondecreasing order. Each key has anassociated child that is the root of a subtree containing all nodes with keysless than or equal to the key but greater than the preceding key. A node alsohas an additional rightmost child that is the root for a subtree containingall keys greater than any keys in the node. Thus, each node has one morepointer than keys.*Figure 12.5 illustrates two levels of a B-tree. The upper level has ( k 1) keysand k pointers and satisfies the following relationship:

Key1 6 Key3 6 c 6 Keyk-1

Each pointer points to a node that is the top level of a subtree of this upper levelnode. Each of these subtree nodes contains some number of keys and pointers,unless it is a leaf node.

To search for a key, you start at the root node. If the key you want is in thenode, youre done. If not, you go down one level. There are three cases:

1. The key you want is less then the smallest key in this node. Take the leftmostpointer down to the next level.

2. The key you want is greater than the largest key in this node. Take the rightmostpointer down to the next level.

3. The value of the key is between the values of two adjacent keys in this node.Take the pointer between these keys down to the next level.For example, consider the tree in Figure 12.5d and the desired key is 84. Atthe root level, 84 51, so you take the rightmost branch down to the next level.Here, we have 61 84 71, so you take the pointer between 61 and 71 down to thenext level, where the key 84 is found. Associated with this key is a pointer to thedesired record. An advantage of this tree structure over other tree structures is thatit is broad and shallow, so that the search terminates quickly. Furthermore, becauseit is balanced (all branches from root to leaf are of equal length), there are no longsearches compared to other searches.

The rules for inserting a new key into the B-tree must maintain a balancedtree. This is done as follows:

1. Search the tree for the key. If the key is not in the tree, then you have reacheda node at the lowest level.

2. If this node has fewer than 2 d 1 keys, then insert the key into this node in theproper sequence.

3. If the node is full (having 2 d 1 keys), then split this node around its mediankey into two new nodes with d 1 keys each and promote the median key tothe next higher level, as described in step 4. If the new key has a value lessthan the median key, insert it into the lefthand new node; otherwise insertit into the righthand new node. The result is that the original node has beensplit into two nodes, one with d 1 keys and one with d keys.

4. The promoted node is inserted into the parent node following the rules of step3. Therefore, if the parent node is already full, it must be split and its mediankey promoted to the next highest layer.

5. If the process of promotion reaches the root node and the root node is alreadyfull, then insertion again follows the rules of step 3. However, in this case themedian key becomes a new root node and the height of the tree increases by 1.

Figure 12.5 illustrates the insertion process on a B-tree of degree d = 3. In eachpart of the figure, the nodes affected by the insertion process are unshaded.*

2p13

Documents