file organizations sept. 2012yangjun chen acs-3902/31 outline: file organization hardware...

24
Sept. 2012 Yangjun Chen ACS- 3902/3 1 File Organizations Outline: File Organization • Hardware Description of Disk Devices • Buffering of Blocks • File Records on Disk • Operations on Files • Files of Unordered Records (Heap Files) • Files of Ordered Records (Sorted Files)

Post on 18-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Sept. 2012 Yangjun Chen ACS-3902/3 1

File Organizations

Outline: File Organization

• Hardware Description of Disk Devices

• Buffering of Blocks

• File Records on Disk

• Operations on Files

• Files of Unordered Records (Heap Files)

• Files of Ordered Records (Sorted Files)

Sept. 2012 Yangjun Chen ACS-3902/3 2

File Organizations

•Hardware Description of Disk Devices

- Magnetic disks: used for storing amounts of data

data: bits and bytes (characters)

- Two kinds of disks: floppy disks and hard disks

floppy disk: 400 Kbytes - 1.5 Mbytes

hard disk: hundred Mbytes - Gbytes

- single-sided or double-sided

- disk pack: disks assembled together

- track, cylinder and sector

Sept. 2012 Yangjun Chen ACS-3902/3 3

File Organizations

...

actuator

track

cylinder

sector(arc of a track)

Sept. 2012 Yangjun Chen ACS-3902/3 4

File Organizations

- Sector: Each track is divided into sectors by the disk management system. It is hard-coded and cannot be changed.

Not all disks have their tracks divided into sectors.

- Block: Each track is divided into blocks by the operating system during disk formatting (or initialization). Block size is fixed during formatting and cannot be changed dynamically.

Block size: 512 - 4096 bytes.

A disk with hard-coded sectors has the sectors subdivided into blocks.

Blocks are separated by fixed-size interlock gaps.

Sept. 2012 Yangjun Chen ACS-3902/3 5

File Organizations

- Data Transfer (Read/Write)

data transfer unit: a block or a cluster (several contiguous blocks)

- Hardware address: <surface number,

track number,

block number>

- Buffer: a contiguous area reserved in main memory that holds a block.

Sept. 2012 Yangjun Chen ACS-3902/3 6

File Organizations

•Buffering of Blocks

- Several buffers can be reserved in main memory.

While one buffer is being read or written, the CPU can process data in the other buffers.

- concurrency: interleaved or in parallel

timet1t2 t3 t4

A A

B BC

D

Sept. 2012 Yangjun Chen ACS-3902/3 7

File Organizations

- Double Buffering

The CPU can start processing a block once its transfer to main memory is completed; at the same time the disk I/O processor can be reading and transferring the next block into a different buffer.

ifill A

i+1fill B

i+2fill A

i+3fill B

i+4fill A

iprocess A

i+1process B

i+2process A

i+3process B

i+4process A

time

Disk block:

I/O:

Disk block:

Processing:

Sept. 2012 Yangjun Chen ACS-3902/3 8

File Organizations

•File Records on Disk

- Record: a collection of related data values or items, where each value is formed of one or more bytes and corresponds to a particular field of the record.

- Record type: a collection of field names and their corresponding data types. A data type, associated with each field, specifies the type of values a field can take.

Example: struct employee {char name[30];char SSN[9];int salary;int jobcode;char department[20];

};

Sept. 2012 Yangjun Chen ACS-3902/3 9

File Organizations

- Fixed-length records and variable-length records

Fixed-length records:

Name SSN Salary

JobCode

Department Hire-Date

1 30 40 44 48 68 71

Variable-length records:

Smith, John 123456789 xxxx xxxx Computername SSN salary

JobCodedepartment

1 12 21 25 29 37

NAME=Smith, John SSN=123456789 DEPATMENT=Computer

Sept. 2012 Yangjun Chen ACS-3902/3 10

File Organizations

- Record blocking: spanned versus unspanned records

The records of a file must be allocated to disk blocks because a block is the unit of data transfer between disk and memory.

Unspanned records:

record 1 record 2 record 3

unused

record 4 record 5 record 6

Spanned records:

block i

block i+1

record 1 record 2 record 3block i

block i+1

record 4 P

record 4 (rest) record 5 record 6 record 7 P

Sept. 2012 Yangjun Chen ACS-3902/3 11

File Organizations

- Allocating file blocks on disk

Contiguous allocation:

file block 1 file block 2 … ...

disk track

Linked allocation:

file block 1 file block 2

file block 3 … ...

- File header: disk addresses of blocks, record format description (field length, order of fields, field type code, separator characters, record type code, …)

Sept. 2012 Yangjun Chen ACS-3902/3 12

File Organizations

•Operations on Files

operations

general OP.

retrieval OP.

update OP.

combined OP.

record-at-a-time

set-at-a-time

Sept. 2012 Yangjun Chen ACS-3902/3 13

File Organizations

- general operations

Open: Prepares the file for reading or writing. Allocates appropriate buffers (typically at least two) to hold file blocks from disk, and retrieves the file header. Sets the file pointer to the beginning of the file.

file header

bufferfile pointer

file on disk: main memory

Sept. 2012 Yangjun Chen ACS-3902/3 14

File Organizations

Reset: Sets the file pointer of an open file to the beginning of the file.

Close: Completes the file access by releasing the buffers and performing any other needed cleanup operation (e.g., cleanup the information of the file header which is maintained in main memory).

Sept. 2012 Yangjun Chen ACS-3902/3 15

File Organizations

- retrieval operations

Find (or Locate): Searches for the first record that satisfies a search a search condition. Transfers the block containing that record into a main memory buffer (if it not already there). The file pointer points to the record in the buffer and it becomes the current record.

search conditions: SSN = ‘123456789’DEPARTMENT = ‘Research’

SALARY > 30000

Sept. 2012 Yangjun Chen ACS-3902/3 16

File Organizations

- retrieval operations (continued)

Read (or get): Copies the current record from the buffer to a program variable in the user program. This command may also advance the current record pointer to the next record in the file, which may necessitate reading the next file block from disk.

Find Next: Searches for the next record in the file that satisfies the search condition. Transfers the block containing that record into a main memory buffer (if it is not already there). The record is located in the buffer and becomes the current record.

Sept. 2012 Yangjun Chen ACS-3902/3 17

File Organizations

- update operations

Delete: Deletes the current record and (eventually) updates the file on disk to reflect the deletion.

Modify: Modifies some field values for the current record and (eventually) updates the file on disk to reflect the modification.

Insert: Inserts a new record in the file by locating the block where the record is to be inserted, transferring that block into a main memory buffer (if it is not already there), writing the record into the buffer, and (eventually) writing the buffer to disk to reflect the insertion.

Sept. 2012 Yangjun Chen ACS-3902/3 18

File Organizations

- combined operations

Scan (a combination of Find, FindNext, and Read): If the file has just been opened or reset, Scan returns the first record that satisfies the search condition; otherwise, it returns the next record.

file header

buffer

file pointer

file on disk: main memory

current record

next record

If the file is just opened or reset, then execute Find();Then Find-next().

Sept. 2012 Yangjun Chen ACS-3902/3 19

File Organizations

set-at-a-time:

FindAll: Locates all the records in the file that satisfies a search condition.

FindOrdered: Retrieves all the records in the file in some specified order (e.g., descendent key values).

Reorganization: Starts the reorganization process.

Sept. 2012 Yangjun Chen ACS-3902/3 20

File Organizations

- Records are placed in the file in the order in which they are inserted, so new records are inserted at the end of the file.

- Inserting a record is efficient:

•Files of Unordered Records (Heap Files)

takes the last disk block into a buffer;adds the new record to it;rewrites the block back to the disk

- Searching a record is not efficient. It needs scanning the file record by record.

- Deleting a record: find the corresponding block;copy the block into a buffer;delete the record from the buffer;rewrite the record back to the disk.

Sept. 2012 Yangjun Chen ACS-3902/3 21

File Organizations

- Records are ordered on the values of one of their fields - called the ordering field.

•Files of ordered Records (Sequential Files, Sorted Files)

NAME SSN BIRTHDATE JOB SALARY SEX

Aaron, Ed

Abbott, Diane

Acosta, Marc

… ...

block 1

Adams, John

Adams, Robin

Akers, Jan

… ...

block 2

… ...

Sept. 2012 Yangjun Chen ACS-3902/3 22

File Organizations

•Files of ordered Records (Continued)

- Searching a record can be very efficient.

Binary search on the values of the ordered field:Algorithm Binary search on an ordering key of a disk file

l 1; u b; (*b is the number of file blocks*)while (u l) do

begin i (l + u)/2;read bock i of the file into the buffer;if k < (ordering key field value of the first record in i)then u i - 1;else if k > (ordering key field value of the last record in i;)

then l i + 1;else if the record with ordering_key = k is in the buffer

then goto foundelse goto notfound;

endgoto notfound;

k is the value

to be found

Sept. 2012 Yangjun Chen ACS-3902/3 23

File Organizations

•Files of ordered Records (Continued)

- Searching a record can be very efficient.

But searching on the non-ordering field values is the same as in the unordered file.

- Deletion can be done efficiently:find the record; delete it and mark the place;periodically reorganize the file.

- Insertion is expensive:find the position to insert the record with k;make space for the record with k;put the record with k in the place.

Sept. 2012 Yangjun Chen ACS-3902/3 24

File Organizations

•Files of ordered Records (Continued)

- Two methods for making insertion more efficient1. Keep some unused space in each block. 2. Maintain a temporary unordered file called an overflow

or transaction file. Periodically, the overflow file is merged with the main file

insertion

A B

Aordered

B+

A’

merging

main file overflow file