sample copy. not for distribution. · architectural mapping using data flow 264 6.1.17 user...

Sample Copy. Not For Distribution.

i

Advanced Concepts of Information Technology


ii

Publishing-in-support-of,

EDUCREATION PUBLISHING

RZ 94, Sector - 6, Dwarka, New Delhi - 110075 Shubham Vihar, Mangla, Bilaspur, Chhattisgarh - 495001

Website: www.educreation.in _____________________________________________________________________________

© Copyright, 2018, Dr Kashif Qureshi

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form by any means, electronic, mechanical, magnetic, optical, chemical, manual, photocopying, recording or otherwise, without the prior written consent of its writer.

ISBN: 978-93-88381-65-9

Price: ₹ 764.00

The opinions/ contents expressed in this book are solely of the author and do not represent the opinions/ standings/ thoughts of Educreation.

Printed in India


iii

Advanced Concepts

of

Information Technology

By

Dr Kashif Qureshi

EDUCREATION PUBLISHING (Since 2011)

www.educreation.in


iv


v

Content

Sr. Chapter Page

Part-I

1

1.1 File Structures Indexing and Hashing 1

1.1.1. Disk Storage, Basic File Structures, and Hashing 1

1.1.2. Secondary Storage Devices 4

1.1.3. Buffering of Blocks 9

1.1.4. Placing File Records on Disk 9

1.1.5. Operations on Files 13

1.1.6. Files of Unordered Records (Heap Files) 15

1.1.7. Files of Ordered Records (Sorted Files) 16

1.1.8 Hashing Techniques 18

1.1.9 Internal Hashing 18

1.1.10 External Hashing for Disk Files 20

1.1.11 Hashing Techniques That Allow Dynamic File

Expansion

22

1.1.12 Other Primary File Organizations 26

1.1.13 Parallelizing Disk Access Using RAID Technology 27

1.1.14 New Storage Systems 30

1.1.15 Indexing Structures for Files 32

1.1.16 Types of Single-Level Ordered Indexes 33

1.1.17 Multilevel Indexes 42


vi

1.1.18 Dynamic Multilevel Indexes Using B-Trees and B+-

Trees

44

1.1.19 Search Trees and B-Trees 54

1.1.20 B+-Trees 59

1.1.21 Indexes on Multiple Keys 65

1.1.23 Some General Issues Concerning Indexing 71

Part-II

73

2.1 PROCESSOR AND CONTROL UNIT 73

2.1.1 Basic MIPS Implementation 73

2.1.2 Building Data Path and Control Implementation

Scheme

74

2.1.3 Pipelining 75

2.1.4 Pipelined Data Path and Control 76

2.1.5 Handling Data Hazards & Control Hazards 77

2.1.6 Exceptions in Processor and Control Unit 78

2.2 MEMORY AND I/O SYSTEMS 79

2.2.1 Memory Hierarchy 79

2.2.2 Memory Technologies 80

2.2.3 Cache Basics 81

2.2.4 Measuring and Improving Cache Performance 82

2.2.5 Virtual Memory 86

2.2.6 TLBS- Input/Output System 88

2.2.7 Programmed I/O 89

2.2.8 DMA and Interrupts 90

2.2.9 I/O Processors 92

2.2.10 Glossary - Computer Architecture 93


vii

Part – III

98

3.1 Design and Analysis of Algorithms: Introduction 98

3.1.1 Introduction to the Design and Analysis of Algorithms 98

3.1.2 What Is an Algorithm 99

3.1.3 Fundamentals of Algorithmic Problem Solving 104

3.1.4 Ascertaining the Capabilities of the Computational

Device

111

3.1.5 Algorithm Design Techniques 111

3.1.6 Designing an Algorithm and Data Structures 112

3.1.7 Methods of Specifying an Algorithm 112

3.1.8 Proving an Algorithm’s Correctness 113

3.1.9 Analyzing an Algorithm 113

3.1.10 Coding an Algorithm 114

3.1.11 Important Problem Types in Algorithms Analysis 115

3.1.12 Fundamental Data Structures 119

Part – IV

130

4.1 Fundamentals of the Analysis of Algorithm Eficiency 130

4.1.1 The Analysis Framework 130

4.1.2 Asymptotic Notations and Basic Efficiency Classes 136

4.1.3 Mathematical Analysis of Non recursive Algorithms 143

4.1.4 Mathematical Analysis of Recursive Algorithms 149

4.1.5 Example: Computing the nth Fibonacci Number 156

4.1.6 Empirical Analysis of Algorithms 160

4.1.7 Algorithm Visualization 166


viii

4.2 Chapter 3 Brute Force and Exhaustive Search 168

4.2.1 Brute Force and Exhaustive Search 168

4.2.2 Selection Sort and Bubble Sort 169

4.2.3 Sequential Search and Brute-Force String Matching 173

4.2.4 Closest-Pair and Convex-Hull Problems by Brute

Force

176

4.2.5 Exhaustive Search 180

4.2.6 Depth-First Search and Breadth-First Search 185

4.3 Chapter 4 Decrease and Conquer 190

4.3.1 Decrease and Conquer 190

4.3.2 Insertion Sort 193

4.3.3 Topological Sorting 196

4.3.4 Algorithms for Generating Combinatorial Objects 200

4.3.5 Decrease by a Constant Factor Algorithms 205

4.3.6 Variable Size Decrease Algorithms 210

Part – V

218

5.1 OPERATING SYSTEMS: OVERVIEW 218

5.1.1 Computer System Review 218

5.1.2 Operating System Components 221

5.1.3 Interrupts in Operating Systems 223

5.1.4 Memory Hierarchy - Operating Systems 224

5.1.5 Cache Memory 225

5.1.6 Direct Memory Access 225

5.1.7 Multiprocessor and Multicore Organization 226

5.1.8 Operating System Overview 229


ix

5.1.9 Evolution of Operating System 229

5.1.10 Computer System Organization 230

5.1.11 System Calls 231

5.1.12 System Programs 232

5.1.13 Important Short Questions and Answers: Operating

Systems - Process and Threads

232

Part - VI

237

6.1 Software Engineering 237

6.1.1 Introduction to Software Engineering 237

6.1.2 Software Process, Perspective and Specialized Process

Models

239

6.1.3 Software Project Management: Estimation 242

6.1.4 LOC and FP Based Estimation, COCOMO Model

Requirements Analysis Design 243

6.1.5 Project Scheduling – Scheduling, Earned Value Analysis -

Risk Management

Project Scheduling – Scheduling, Earned Value

Analysis:

250

6.1.6 Software Requirements: Functional and Non-

Functional, User , System Requirements

Software Requirements Document

251

6.1.7 Requirement Engineering Process: Feasibility Studies,

Requirements elicitation and analysis

253

6.1.8 Requirements Validation, Requirements Management 256

6.1.9 Classical Analysis 258

6.1.10 Structured Systems Analysis 259

6.1.11 Petri Nets-Data Dictionary 260

6.1.12 Design Process 260

6.1.13 Design Concepts-Design Model 261

6.1.14 Design Heuristic 262


x

6.1.15 Architectural Design 263

6.1.16 Architectural Styles, Architectural Design,

Architectural Mapping using Data Flow

264

6.1.17 User Interface Design 266

6.1.18 Interface Analysis, Interface Design 267

6.1.19 Component Level Design 268

6.1.20 Designing Class based Components, Traditional

Components

268

6.1.21 Software Testing Fundamentals 269

6.1.22 Internal and External Views of Testing 270

6.1.23 White Box Testing-basis Path Testing 271

6.1.24 Control Structure Testing 271

6.1.25 Black Box Testing 273

6.1.26 Regression Testing 274

6.1.27 Unit Testing 274

6.1.28 Integration Testing 275

6.1.29 Validation Testing 277

6.1.30 System Testing and Debugging 278

6.1.31 Software Implementation Techniques: Coding

practices

279

6.1.32 Refactoring 279

6.1.33 Estimation – FP Based, LOC Based, Make/Buy

Decision, COCOMO II

281

6.1.34 Project Planning Phase 285

6.1.35 Identification, Projection, RMMM 286

6.1.36 Scheduling and Tracking 290

6.1.37 Relationship between people and effort, Task Set &

Network, Scheduling, EVA

292

6.1.38 Process and Project Metrics 294


xi

Part – VII

296

7.1 COMPUTER NETWORKS: FUNDAMENTALS AND LINK

LAYER

296

7.1.1 Building a Computer Network 296

7.1.2 Requirements of Computer Networks 296

7.1.3 Layering and Protocol 303

7.1.4 Internet Architecture 309

7.1.5 Network Software 310

7.1.6 Performance :Link Layer Services 311

7.1.7 Framing in Computer Networks 313

7.1.8 Error Detection and Correction: Its types 315

7.1.9 Flow Control 319

7.1.10 Important Short Questions and Answers : Computer

Networks - Fundamentals & Link Layer

324

Part – VIII

328

8.1 DATABASE MANAGEMENT SYSTEMS 328

8.1.1 Trust Management in Virtualized Data Centers 328

8.1.2 Introduction to DBMS(Database Management

Systems)

330

8.1.3 Purpose of Database Systems 332

8.1.4 File systems vs Database systems 334

8.1.5 Database System Terminologies 338

8.1.6 Data Models 339

8.1.7 Components of DBMS 341

8.1.8 Relational Algebra 342


xii

8.1.9 ER Model 347

8.1.10 Functional Dependencies Definition 352

8.1.11 Database Normalization 354

8.1.12 Data Anomalies 355

8.1.13 SQL Overview 368

8.1.14 Data Types in SQL 370

8.1.15 Object-Oriented Database Management System 371

8.1.16 Data Definition Language or Data Description

Language(DDL)

371

8.1.17 Data Manipulation Language (DML) 373

8.1.18 Data Control Language(DCL) 375

8.1.19 Transaction Control Language (TCL) 378

8.1.20 Embedded SQL 381

8.1.21 Query Processing and Optimization (QPO) 381

8.1.22 Transaction Processing 391

8.1.23 Introduction to Concurrency 396

8.1.24 Lock 396

8.1.25 Two-Phase Locking Techniques: The algorithm 400

8.1.26 Physical Storage Media 407

8.1.27 RAID: Redundant Arrays of Independent Disks 409

8.1.28 File Operations 410

8.1.29 Hashing 415

8.1.30 Indexing 414

8.1.31 B+-Tree Index Files 416

8.1.32 Data Warehouse 418

8.1.33 Data Mining 419

8.1.34 Mobile Databases 419


xiii

8.1.35 Spatial Database Types of Spatial Data 421

8.1.36 Multi-dimensional Indexes 422

8.1.37 Databases and Database Users 423

8.1.38 An Example - Databases and Database Users 425

8.1.39 Characteristics of the Database Approach 428

8.1.40 Actors on the Scene - Databases and Database Users 434

8.1.41 Workers behind the Scene - Databases and Database

Users

434

8.1.42 Advantages of Using the DBMS Approach 434

8.1.43 A Brief History of Database Applications 438

8.1.44 When Not to Use a DBMS 441

8.1.45 Database System Concepts and Architecture 441

8.1.46 Data Models, Schemas, and Instances 442

8.1.47 Three-Schema Architecture and Data Independence 444

8.1.48 Database Languages and Interfaces 446

8.1.49 The Database System Environment 449

8.1.50 Centralized and Client/Server Architectures for

DBMSs

452

8.1.51 Classification of Database Management Systems 456

8.1.52 The Relational Data Model and Relational Database

Constraints

458

8.1.53 Relational Model Concepts 459

8.1.54 Relational Model Constraints and Relational Database

Schemas

464

8.1.55 Update Operations, Transactions, and Dealing with

Constraint Violations

470

8.1.56 Basic SQL 473

8.1.57 SQL Data Definition and Data Types 474

8.1.58 Specifying Constraints in SQL 478


xiv

8.1.59 Basic Retrieval Queries in SQL 480

8.1.60 INSERT, DELETE, and UPDATE Statements in SQL 488

8.1.61 Additional Features of SQL 490

8.1.62 More SQL: Complex Queries, Triggers, Views, and

Schema Modification

491

8.1.63 More Complex SQL Retrieval Queries 491

8.1.64 Specifying Constraints as Assertions and Actions as

Triggers

503

8.1.65 Views (Virtual Tables) in SQL 505



Page 1 of 520

Part-I

1.1. File Structures Indexing and Hashing

1.1.1. Disk Storage, Basic File Structures, and Hashing

Databases are stored physically as files of records, which are typically stored on magnetic disks.

This chapter and the next deal with the organization of databases in storage and the tech niques for

accessing them efficiently using various algorithms, some of which require auxiliary data structures

called indexes. These structures are often referred to as physical database file structures, and are

at the physical level of the three-schema architecture described in Chapter 2. We start in Section

17.1 by introducing the concepts of computer storage hierarchies and how they are used in database

systems. Section 17.2 is devoted to a description of magnetic disk storage devices and their

characteristics, and we also briefly describe magnetic tape storage devices. After discussing

different storage technologies, we turn our attention to the methods for physically organizing data

on disks. Section 17.3 covers the technique of double buffering, which is used to speed retrieval of

multiple disk blocks. In Section we discuss various ways of formatting and storing file records on

disk. Section discusses the various types of operations that are typically applied to file records. We

present three primary methods for organizing file records on disk: unordered records, in Section

17.6; ordered records, in Section 17.7; and hashed records, in Section 17.8.

Section 17.9 briefly introduces files of mixed records and other primary methods for organizing

records, such as B-trees. These are particularly relevant for storage of object-oriented databases,

which we discussed in Chapter 11. Section 17.10 describes RAID (Redundant Arrays of

Inexpensive (or Independent) Disks)—a data storage system architecture that is commonly used in

large organizations for better reliability and performance. Finally, in Section 17.11 we describe

three developments in the storage systems area: storage area networks (SAN), network attached

storage (NAS), and iSCSI (Internet SCSI—Small Computer System Interface), the latest

technology, which makes storage area networks more afford-able without the use of the Fiber

Channel infrastructure and hence is getting very wide acceptance in industry. Section 17.12

summarizes the chapter. In Chapter 18 we discuss techniques for creating auxiliary data structures,

called indexes, which speed up the search for and retrieval of records. These techniques involve

storage of auxiliary data, called index files, in addition to the file records themselves.

Chapters 17 and 18 may be browsed through or even omitted by readers who have already studied

file organizations and indexing in a separate course. The material covered here, in particular

Sections 17.1 through 17.8, is necessary for understand-ing Chapters 19 and 20, which deal with

query processing and optimization, and database tuning for improving performance of queries.

Introduction

The collection of data that makes up a computerized database must be stored phys-ically on some

computer storage medium. The DBMS software can then retrieve, update, and process this data

as needed. Computer storage media form a storage hierarchy that includes two main categories:

Primary storage. This category includes storage media that can be operated on directly by the

computer’s central processing unit (CPU), such as the com-puter’s main memory and smaller but

faster cache memories. Primary stor-age usually provides fast access to data but is of limited storage

capacity. Although main memory capacities have been growing rapidly in recent years, they are

still more expensive and have less storage capacity than sec-ondary and tertiary storage devices.

Secondary and tertiary storage. This category includes magnetic disks, optical disks (CD-

ROMs, DVDs, and other similar storage media), and tapes. Hard-disk drives are classified as

secondary storage, whereas removable media such as optical disks and tapes are considered tertiary

storage. These devices usually have a larger capacity, cost less, and provide slower access to data



Page 2 of 520

than do primary storage devices. Data in secondary or tertiary storage cannot be processed directly

by the CPU; first it must be copied into primary storage and then processed by the CPU.

We first give an overview of the various storage devices used for primary and secondary storage in

Section 17.1.1 and then discuss how databases are typically handled in the storage hierarchy in

Section 17.1.2.

1. Memory Hierarchies and Storage Devices

In a modern computer system, data resides and is transported throughout a hierarchy of storage

media. The highest-speed memory is the most expensive and is there-fore available with the least

capacity. The lowest-speed memory is offline tape storage, which is essentially available in

indefinite storage capacity.

At the primary storage level, the memory hierarchy includes at the most expensive end, cache

memory, which is a static RAM (Random Access Memory). Cache memory is typically used by

the CPU to speed up execution of program instructions using techniques such as prefetching and

pipelining. The next level of primary stor-age is DRAM (Dynamic RAM), which provides the main

work area for the CPU for keeping program instructions and data. It is popularly called main

memory. The advantage of DRAM is its low cost, which continues to decrease; the drawback is

its volatility and lower speed compared with static RAM. At the secondary and tertiary storage

level,the hierarchy includes magnetic disks, as well as mass storage in the form of CD-ROM

(Compact Disk–Read-Only Memory) and DVD (Digital Video Disk or Digital Versatile Disk)

devices, and finally tapes at the least expensive end of the hierarchy. The storage capacity is

measured in kilobytes (Kbyte or 1000 bytes), megabytes (MB or 1 million bytes), gigabytes (GB

or 1 billion bytes), and even terabytes (1000 GB). The word petabyte (1000 terabytes or 10**15

bytes) is now becoming relevant in the context of very large repositories of data in physics,

astronomy, earth sciences, and other scientific applications.

Programs reside and execute in DRAM. Generally, large permanent databases reside on secondary

storage, (magnetic disks), and portions of the database are read into and written from buffers in

main memory as needed. Nowadays, personal computers and workstations have large main

memories of hundreds of megabytes of RAM and DRAM, so it is becoming possible to load a large

part of the database into main memory. Eight to 16 GB of main memory on a single server is

becoming common-place. In some cases, entire databases can be kept in main memory (with a

backup copy on magnetic disk), leading to main memory databases; these are particularly useful

in real-time applications that require extremely fast response times. An example is telephone

switching applications, which store databases that contain routing and line information in main

memory.

Between DRAM and magnetic disk storage, another form of memory, flash memory, is becoming

common, particularly because it is nonvolatile. Flash memories are high-density, high-performance

memories using EEPROM (Electrically Erasable Programmable Read-Only Memory) technology.

The advantage of flash memory is the fast access speed; the disadvantage is that an entire block

must be erased and written over simultaneously. Flash memory cards are appearing as the data

storage medium in appliances with capacities ranging from a few megabytes to a few gigabytes.

These are appearing in cameras, MP3 players, cell phones, PDAs, and so on. USB (Universal Serial

Bus) flash drives have become the most portable medium for carrying data between personal

computers; they have a flash memory storage device integrated with a USB interface.

CD-ROM (Compact Disk – Read Only Memory) disks store data optically and are read by a laser.

CD-ROMs contain prerecorded data that cannot be overwritten. WORM (Write-Once-Read-Many)

disks are a form of optical storage used for archiving data; they allow data to be written once and

read any number of times without the possibility of erasing. They hold about half a gigabyte of data

per disk and last much longer than magnetic disks. Optical jukebox memories use an array of CD-

ROM platters, which are loaded onto drives on demand. Although optical jukeboxes have capacities

in the hundreds of gigabytes, their retrieval times are in the hundreds of milliseconds, quite a bit

slower than magnetic disks. This type of storage is continuing to decline because of the rapid



Page 3 of 520

decrease in cost and increase in capacities of magnetic disks. The DVD is another standard for

optical disks allowing 4.5 to 15 GB of storage per disk. Most personal computer disk drives now

read CD-ROM and DVD disks. Typically, drives are CD-R (Compact Disk Recordable) that can

create CD-ROMs and audio CDs (Compact Disks), as well as record on DVDs.

Finally, magnetic tapes are used for archiving and backup storage of data. Tape jukeboxes—

which contain a bank of tapes that are catalogued and can be automat-ically loaded onto tape

drives—are becoming popular as tertiary storage to hold terabytes of data. For example, NASA’s

EOS (Earth Observation Satellite) system stores archived databases in this fashion.

Many large organizations are already finding it normal to have terabyte-sized data-bases. The

termvery large database can no longer be precisely defined because disk storage capacities are on

the rise and costs are declining. Very soon the term may be reserved for databases containing tens

of terabytes.

2. Storage of Databases

Databases typically store large amounts of data that must persist over long periods of time, and

hence is often referred to as persistent data. Parts of this data are accessed and processed

repeatedly during this period. This contrasts with the notion of transient data that persist for only

a limited time during program execution. Most databases are stored permanently (or persistently)

on magnetic disk secondary storage, for the following reasons:

Generally, databases are too large to fit entirely in main memory.

The circumstances that cause permanent loss of stored data arise less frequently for disk

secondary storage than for primary storage. Hence, we refer to disk—and other secondary storage

devices—asnonvolatile storage, whereas main memory is often called volatile storage.

The cost of storage per unit of data is an order of magnitude less for disk secondary storage

than for primary storage.

Some of the newer technologies—such as optical disks, DVDs, and tape juke-boxes—are likely to

provide viable alternatives to the use of magnetic disks. In the future, databases may therefore

reside at different levels of the memory hierarchy from those described in Section 17.1.1. However,

it is anticipated that magnetic disks will continue to be the primary medium of choice for large

databases for years to come. Hence, it is important to study and understand the properties and

characteristics of magnetic disks and the way data files can be organized on disk in order to design

effective databases with acceptable performance.

Magnetic tapes are frequently used as a storage medium for backing up databases because storage

on tape costs even less than storage on disk. However, access to data on tape is quite slow. Data

stored on tapes is offline; that is, some intervention by an operator—or an automatic loading

device—to load a tape is needed before the data becomes available. In contrast, disks

are online devices that can be accessed directly at any time.

The techniques used to store large amounts of structured data on disk are important for database

designers, the DBA, and implementers of a DBMS. Database designers and the DBA must know

the advantages and disadvantages of each storage technique when they design, implement, and

operate a database on a specific DBMS. Usually, the DBMS has several options available for

organizing the data. The process of physical database design involves choosing the particular data

organization techniques that best suit the given application requirements from among the options.

DBMS system implementers must study data organization techniques so that they can implement

them efficiently and thus provide the DBA and users of the DBMS with sufficient options.

Typical database applications need only a small portion of the database at a time for processing.

Whenever a certain portion of the data is needed, it must be located on disk, copied to main memory

for processing, and then rewritten to the disk if the data is changed. The data stored on disk is

organized as files of records. Each record is a collection of data values that can be interpreted as

facts about entities, their attributes, and their relationships. Records should be stored on disk in a

manner that makes it possible to locate them efficiently when they are needed.


Get Complete Book At Educreation Store

www.educreation.in


sample copy. not for distribution. · architectural mapping using data flow 264 6.1.17 user...

Documents