data structures and algorithms 9 - edx

20
Data Structures and Algorithms(9) Instructor: Ming Zhang Textbook Authors: Ming Zhang, Tengjiao Wang and Haiyan Zhao Higher Education Press, 2008.6 (the "Eleventh Five-Year" national planning textbook) https://courses.edx.org/courses/PekingX/04830050x/2T2014/ Ming Zhang "Data Structures and Algorithms"

Upload: others

Post on 30-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Structures and Algorithms 9 - edX

Data Structures and Algorithms(9)

Instructor: Ming ZhangTextbook Authors: Ming Zhang, Tengjiao Wang and Haiyan Zhao

Higher Education Press, 2008.6 (the "Eleventh Five-Year" national planning textbook)

https://courses.edx.org/courses/PekingX/04830050x/2T2014/

Ming Zhang "Data Structures and Algorithms"

Page 2: Data Structures and Algorithms 9 - edX

2

目录页

张铭《数据结构与算法》2 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Chapter 9 File management and External Sorting

• 9.1 Primary vs. Secondary Storage

• 9.2 File Organization and File management

– 9.2.1 File Organization

– 9.2.2 Stream File of C++

• 9.3 External Sorting

File management and External Sorting

Page 3: Data Structures and Algorithms 9 - edX

3

目录页

张铭《数据结构与算法》3 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Primary vs. Secondary Storage• Two main kinds of computer storage:

• Primary/main memory

• Random Access Memory (RAM)

• Cache

• Video memory

• Peripheral/secondary storage

• Hard disk (100 GB – 100 TB, 1012

B )

• Tape (100 PB, 1015

B )

9.1 Primary vs. Secondary Storage

Page 4: Data Structures and Algorithms 9 - edX

4

目录页

张铭《数据结构与算法》4 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Physical Storage Medium Overview

9.1 Primary vs. Secondary Storage

basic storage

Cache

Primary storage

Flash memory

Hard disk

CD

Tape

Nonvolatile storage

tertiary storage

Volatile storage

auxiliary storage

Page 5: Data Structures and Algorithms 9 - edX

5

目录页

张铭《数据结构与算法》5 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Disk Drive Architecture

9.1 Primary vs. Secondary Storage

Boom

(arm)

spindle platters

track

read/write head

cylinder

Page 6: Data Structures and Algorithms 9 - edX

6

目录页

张铭《数据结构与算法》6 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Platter Organization

9.1 Primary vs. Secondary Storage

Page 7: Data Structures and Algorithms 9 - edX

7

目录页

张铭《数据结构与算法》7 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Track Organization (Interleaving)

9.1 Primary vs. Secondary Storage

(a) sectors are sequential; (b) interleave factor is 3

•512 or 1024 bytes per page

Page 8: Data Structures and Algorithms 9 - edX

8

目录页

张铭《数据结构与算法》8 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Pros and Cons of Primary Storage

• Pros: the access speed is fast

• Cons: expensive, small size, lose

data when power is off

• CPU interacts with primary storage

directly, so the access time of

primary storage can be assumed

as a small constant

9.1 Primary vs. Secondary Storage

Page 9: Data Structures and Algorithms 9 - edX

9

目录页

张铭《数据结构与算法》9 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Pros and Cons of Secondary Storage

• Pros: cheap, nonvolatile, portable

• Cons: the access speed is low

• Generally, the unit of access time of primary storage

is in nanosecond, 1 ns = 10-9

s

• The unit of access time of secondary storage is in

millisecond, or even second, 1 ms = 10-3

s

• When a computer program involves secondary

storage, we should reduce the times of access,

to reduce the execution time

9.1 Primary vs. Secondary Storage

Page 10: Data Structures and Algorithms 9 - edX

10

目录页

张铭《数据结构与算法》10 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

• KB (kilo byte) 103B (page)

• MB (mega byte) 106B (cache)

• GB (giga) 109B (memory, disk)

• TB (tera) 1012

B (RAID)

• PB (peta) 1015

B (tape library)

• EB = 1018

B;ZB = 1021

B;YB = 1024

B

• Googol is 10100

9.1 Primary vs. Secondary Storage

Page 11: Data Structures and Algorithms 9 - edX

11

目录页

张铭《数据结构与算法》11 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Logical Structure of Files

• File is a set of records

• The records of the file are arranged in a

specific order, so that they form a linear

relation between themselves naturally.

• Thus, file is a linear structure.

9.2 File Organization and File management

Page 12: Data Structures and Algorithms 9 - edX

12

目录页

张铭《数据结构与算法》12 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

File Organization and File Management

• Logical file

• For programmer using high-level programming languages

• Records are made up of contiguous bytes, and logical files are made up of

records

• Physical file

• Stored in disk block by block

• File manager

• Part of the operating system or database system

- The records in an OS file do not have an explicit structure, while a

database file consists of structured records

• Map logical position to specific physical position in disk drive

9.2 File Organization and File Management

Page 13: Data Structures and Algorithms 9 - edX

13

目录页

张铭《数据结构与算法》13 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

File Organization• 3 kinds of logical file organization

– Fixed-length records of sequential structure

– Variable-length records of sequential structure

– Records accessed by key values

• General physical file organization– Sequential structure (sequential file)

– Address-calculation structure (hash file)

– Index structure (index file)• Inverted index is a special kind of index

9.2 File Organization and File Management

Page 14: Data Structures and Algorithms 9 - edX

14

目录页

张铭《数据结构与算法》14 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Operations of File• Retrieval. Find a record which meets certain

conditions in the file• Modify. Update the value of a record; if you

update its key value, it is equivalent to delete an old record and insert a new one

• Insert. Add a new record to the file• Delete. Remove a record from the file• Sort. Rearrange the records according to

specific data fields; usually rearrange by the key values

9.2 File Organization and File Management

Page 15: Data Structures and Algorithms 9 - edX

15

目录页

张铭《数据结构与算法》15 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Standard I/O Stream Class in C++ • Standard I/O stream class

– istream: a base class of input stream classes

– ostream: a base class of output stream classes

– iostream: a base class of input/output stream classes

• 3 file classes used to manipulate files

– ifstream: derived from istream, supporting the input of a file on the disk

– ofstream: derived from ostream, supporting the output of a file on the disk

– fstream: derived from iostream, supporting the input and output of a file on the

disk

9.2 File Organization and File Management

Page 16: Data Structures and Algorithms 9 - edX

16

目录页

张铭《数据结构与算法》16 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Major Methods of fstream Class

#include <fstream.h> // fstream = ifstream + ofstream

void fstream::open(char*name, openmode mode);

// open the file

fstream::read(char*ptr, int numbytes); // read bytes from the current position of the file

fstream::write(char*ptr, int numbtyes); // write bytes to the current position of the file

// seekg and seekp: move the current position of the file

// so that we can read/write bytes anywhere we want

fstream::seekg(int pos); // set the “get” position for reading

fstream::seekg(int pos, ios::curr);

fstream::seekp(int pos); // set the “put” position for writting

fstream::seekp(int pos, ios::end);

void fstream::close(); // close the file

9.2 File Organization and File Management

Seek the "get"/"put" positions;Read from the "get" position; Write to the "put" position

Page 17: Data Structures and Algorithms 9 - edX

17

目录页

张铭《数据结构与算法》17 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Buffer and Buffer Pool

• Purpose: reduce the disk access latency

• Method: buffering or caching

• Retain as many blocks as possible in the primary storage

• Increase the possibility that the blocks to access stay in

the primary storage

• The data stored in the same buffer form a

page, which is the unit of file I/O

• A collection of buffers forms a buffer pool

9.2 File Organization and File Management

Page 18: Data Structures and Algorithms 9 - edX

18

目录页

张铭《数据结构与算法》18 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Strategies for Buffer Replacement

• When a new page requests for a

buffer, release an old buffer that

may not be used again, and store

the new page in this buffer

– “First-In-First-Out” (FIFO)

– “Least Frequently Used” (LFU)

– “Least Recently Used” (LRU)

9.2 File Organization and File Management

Page 19: Data Structures and Algorithms 9 - edX

19

目录页

张铭《数据结构与算法》19 Ming Zhang “Data Structures and Algorithms”

Chapter 9

File management and External Sorting

Thinking

1. Find the price per byte of the storage

devices, such as main memory, disk

drives, tapes, cache, etc.

2. Find the performance indicators of

mainstream disks

‒ Capacity (GB)

‒ Revolutions per minute (rpm)

‒ Interleave factor

‒ Seek time

‒ Rotational delay time

9.2 File Organization and File Management

Page 20: Data Structures and Algorithms 9 - edX

Data Structures and Algorithms

Thanks

the National Elaborate Course (Only available for IPs in China)http://www.jpk.pku.edu.cn/pkujpk/course/sjjg/

Ming Zhang, Tengjiao Wang and Haiyan ZhaoHigher Education Press, 2008.6 (awarded as the "Eleventh Five-Year" national planning textbook)

Ming Zhang “Data Structures and Algorithms”