sample copy. not for distribution. · architectural mapping using data flow 264 6.1.17 user...
TRANSCRIPT
Sample Copy. Not For Distribution.
i
Advanced Concepts of Information Technology
Sample Copy. Not For Distribution.
ii
Publishing-in-support-of,
EDUCREATION PUBLISHING
RZ 94, Sector - 6, Dwarka, New Delhi - 110075 Shubham Vihar, Mangla, Bilaspur, Chhattisgarh - 495001
Website: www.educreation.in _____________________________________________________________________________
© Copyright, 2018, Dr Kashif Qureshi
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form by any means, electronic, mechanical, magnetic, optical, chemical, manual, photocopying, recording or otherwise, without the prior written consent of its writer.
ISBN: 978-93-88381-65-9
Price: ₹ 764.00
The opinions/ contents expressed in this book are solely of the author and do not represent the opinions/ standings/ thoughts of Educreation.
Printed in India
Sample Copy. Not For Distribution.
iii
Advanced Concepts
of
Information Technology
By
Dr Kashif Qureshi
EDUCREATION PUBLISHING (Since 2011)
www.educreation.in
Sample Copy. Not For Distribution.
iv
Sample Copy. Not For Distribution.
v
Content
Sr. Chapter Page
Part-I
1
1.1 File Structures Indexing and Hashing 1
1.1.1. Disk Storage, Basic File Structures, and Hashing 1
1.1.2. Secondary Storage Devices 4
1.1.3. Buffering of Blocks 9
1.1.4. Placing File Records on Disk 9
1.1.5. Operations on Files 13
1.1.6. Files of Unordered Records (Heap Files) 15
1.1.7. Files of Ordered Records (Sorted Files) 16
1.1.8 Hashing Techniques 18
1.1.9 Internal Hashing 18
1.1.10 External Hashing for Disk Files 20
1.1.11 Hashing Techniques That Allow Dynamic File
Expansion
22
1.1.12 Other Primary File Organizations 26
1.1.13 Parallelizing Disk Access Using RAID Technology 27
1.1.14 New Storage Systems 30
1.1.15 Indexing Structures for Files 32
1.1.16 Types of Single-Level Ordered Indexes 33
1.1.17 Multilevel Indexes 42
Sample Copy. Not For Distribution.
vi
1.1.18 Dynamic Multilevel Indexes Using B-Trees and B+-
Trees
44
1.1.19 Search Trees and B-Trees 54
1.1.20 B+-Trees 59
1.1.21 Indexes on Multiple Keys 65
1.1.23 Some General Issues Concerning Indexing 71
Part-II
73
2.1 PROCESSOR AND CONTROL UNIT 73
2.1.1 Basic MIPS Implementation 73
2.1.2 Building Data Path and Control Implementation
Scheme
74
2.1.3 Pipelining 75
2.1.4 Pipelined Data Path and Control 76
2.1.5 Handling Data Hazards & Control Hazards 77
2.1.6 Exceptions in Processor and Control Unit 78
2.2 MEMORY AND I/O SYSTEMS 79
2.2.1 Memory Hierarchy 79
2.2.2 Memory Technologies 80
2.2.3 Cache Basics 81
2.2.4 Measuring and Improving Cache Performance 82
2.2.5 Virtual Memory 86
2.2.6 TLBS- Input/Output System 88
2.2.7 Programmed I/O 89
2.2.8 DMA and Interrupts 90
2.2.9 I/O Processors 92
2.2.10 Glossary - Computer Architecture 93
Sample Copy. Not For Distribution.
vii
Part – III
98
3.1 Design and Analysis of Algorithms: Introduction 98
3.1.1 Introduction to the Design and Analysis of Algorithms 98
3.1.2 What Is an Algorithm 99
3.1.3 Fundamentals of Algorithmic Problem Solving 104
3.1.4 Ascertaining the Capabilities of the Computational
Device
111
3.1.5 Algorithm Design Techniques 111
3.1.6 Designing an Algorithm and Data Structures 112
3.1.7 Methods of Specifying an Algorithm 112
3.1.8 Proving an Algorithm’s Correctness 113
3.1.9 Analyzing an Algorithm 113
3.1.10 Coding an Algorithm 114
3.1.11 Important Problem Types in Algorithms Analysis 115
3.1.12 Fundamental Data Structures 119
Part – IV
130
4.1 Fundamentals of the Analysis of Algorithm Eficiency 130
4.1.1 The Analysis Framework 130
4.1.2 Asymptotic Notations and Basic Efficiency Classes 136
4.1.3 Mathematical Analysis of Non recursive Algorithms 143
4.1.4 Mathematical Analysis of Recursive Algorithms 149
4.1.5 Example: Computing the nth Fibonacci Number 156
4.1.6 Empirical Analysis of Algorithms 160
4.1.7 Algorithm Visualization 166
Sample Copy. Not For Distribution.
viii
4.2 Chapter 3 Brute Force and Exhaustive Search 168
4.2.1 Brute Force and Exhaustive Search 168
4.2.2 Selection Sort and Bubble Sort 169
4.2.3 Sequential Search and Brute-Force String Matching 173
4.2.4 Closest-Pair and Convex-Hull Problems by Brute
Force
176
4.2.5 Exhaustive Search 180
4.2.6 Depth-First Search and Breadth-First Search 185
4.3 Chapter 4 Decrease and Conquer 190
4.3.1 Decrease and Conquer 190
4.3.2 Insertion Sort 193
4.3.3 Topological Sorting 196
4.3.4 Algorithms for Generating Combinatorial Objects 200
4.3.5 Decrease by a Constant Factor Algorithms 205
4.3.6 Variable Size Decrease Algorithms 210
Part – V
218
5.1 OPERATING SYSTEMS: OVERVIEW 218
5.1.1 Computer System Review 218
5.1.2 Operating System Components 221
5.1.3 Interrupts in Operating Systems 223
5.1.4 Memory Hierarchy - Operating Systems 224
5.1.5 Cache Memory 225
5.1.6 Direct Memory Access 225
5.1.7 Multiprocessor and Multicore Organization 226
5.1.8 Operating System Overview 229
Sample Copy. Not For Distribution.
ix
5.1.9 Evolution of Operating System 229
5.1.10 Computer System Organization 230
5.1.11 System Calls 231
5.1.12 System Programs 232
5.1.13 Important Short Questions and Answers: Operating
Systems - Process and Threads
232
Part - VI
237
6.1 Software Engineering 237
6.1.1 Introduction to Software Engineering 237
6.1.2 Software Process, Perspective and Specialized Process
Models
239
6.1.3 Software Project Management: Estimation 242
6.1.4 LOC and FP Based Estimation, COCOMO Model
Requirements Analysis Design 243
6.1.5 Project Scheduling – Scheduling, Earned Value Analysis -
Risk Management
Project Scheduling – Scheduling, Earned Value
Analysis:
250
6.1.6 Software Requirements: Functional and Non-
Functional, User , System Requirements
Software Requirements Document
251
6.1.7 Requirement Engineering Process: Feasibility Studies,
Requirements elicitation and analysis
253
6.1.8 Requirements Validation, Requirements Management 256
6.1.9 Classical Analysis 258
6.1.10 Structured Systems Analysis 259
6.1.11 Petri Nets-Data Dictionary 260
6.1.12 Design Process 260
6.1.13 Design Concepts-Design Model 261
6.1.14 Design Heuristic 262
Sample Copy. Not For Distribution.
x
6.1.15 Architectural Design 263
6.1.16 Architectural Styles, Architectural Design,
Architectural Mapping using Data Flow
264
6.1.17 User Interface Design 266
6.1.18 Interface Analysis, Interface Design 267
6.1.19 Component Level Design 268
6.1.20 Designing Class based Components, Traditional
Components
268
6.1.21 Software Testing Fundamentals 269
6.1.22 Internal and External Views of Testing 270
6.1.23 White Box Testing-basis Path Testing 271
6.1.24 Control Structure Testing 271
6.1.25 Black Box Testing 273
6.1.26 Regression Testing 274
6.1.27 Unit Testing 274
6.1.28 Integration Testing 275
6.1.29 Validation Testing 277
6.1.30 System Testing and Debugging 278
6.1.31 Software Implementation Techniques: Coding
practices
279
6.1.32 Refactoring 279
6.1.33 Estimation – FP Based, LOC Based, Make/Buy
Decision, COCOMO II
281
6.1.34 Project Planning Phase 285
6.1.35 Identification, Projection, RMMM 286
6.1.36 Scheduling and Tracking 290
6.1.37 Relationship between people and effort, Task Set &
Network, Scheduling, EVA
292
6.1.38 Process and Project Metrics 294
Sample Copy. Not For Distribution.
xi
Part – VII
296
7.1 COMPUTER NETWORKS: FUNDAMENTALS AND LINK
LAYER
296
7.1.1 Building a Computer Network 296
7.1.2 Requirements of Computer Networks 296
7.1.3 Layering and Protocol 303
7.1.4 Internet Architecture 309
7.1.5 Network Software 310
7.1.6 Performance :Link Layer Services 311
7.1.7 Framing in Computer Networks 313
7.1.8 Error Detection and Correction: Its types 315
7.1.9 Flow Control 319
7.1.10 Important Short Questions and Answers : Computer
Networks - Fundamentals & Link Layer
324
Part – VIII
328
8.1 DATABASE MANAGEMENT SYSTEMS 328
8.1.1 Trust Management in Virtualized Data Centers 328
8.1.2 Introduction to DBMS(Database Management
Systems)
330
8.1.3 Purpose of Database Systems 332
8.1.4 File systems vs Database systems 334
8.1.5 Database System Terminologies 338
8.1.6 Data Models 339
8.1.7 Components of DBMS 341
8.1.8 Relational Algebra 342
Sample Copy. Not For Distribution.
xii
8.1.9 ER Model 347
8.1.10 Functional Dependencies Definition 352
8.1.11 Database Normalization 354
8.1.12 Data Anomalies 355
8.1.13 SQL Overview 368
8.1.14 Data Types in SQL 370
8.1.15 Object-Oriented Database Management System 371
8.1.16 Data Definition Language or Data Description
Language(DDL)
371
8.1.17 Data Manipulation Language (DML) 373
8.1.18 Data Control Language(DCL) 375
8.1.19 Transaction Control Language (TCL) 378
8.1.20 Embedded SQL 381
8.1.21 Query Processing and Optimization (QPO) 381
8.1.22 Transaction Processing 391
8.1.23 Introduction to Concurrency 396
8.1.24 Lock 396
8.1.25 Two-Phase Locking Techniques: The algorithm 400
8.1.26 Physical Storage Media 407
8.1.27 RAID: Redundant Arrays of Independent Disks 409
8.1.28 File Operations 410
8.1.29 Hashing 415
8.1.30 Indexing 414
8.1.31 B+-Tree Index Files 416
8.1.32 Data Warehouse 418
8.1.33 Data Mining 419
8.1.34 Mobile Databases 419
Sample Copy. Not For Distribution.
xiii
8.1.35 Spatial Database Types of Spatial Data 421
8.1.36 Multi-dimensional Indexes 422
8.1.37 Databases and Database Users 423
8.1.38 An Example - Databases and Database Users 425
8.1.39 Characteristics of the Database Approach 428
8.1.40 Actors on the Scene - Databases and Database Users 434
8.1.41 Workers behind the Scene - Databases and Database
Users
434
8.1.42 Advantages of Using the DBMS Approach 434
8.1.43 A Brief History of Database Applications 438
8.1.44 When Not to Use a DBMS 441
8.1.45 Database System Concepts and Architecture 441
8.1.46 Data Models, Schemas, and Instances 442
8.1.47 Three-Schema Architecture and Data Independence 444
8.1.48 Database Languages and Interfaces 446
8.1.49 The Database System Environment 449
8.1.50 Centralized and Client/Server Architectures for
DBMSs
452
8.1.51 Classification of Database Management Systems 456
8.1.52 The Relational Data Model and Relational Database
Constraints
458
8.1.53 Relational Model Concepts 459
8.1.54 Relational Model Constraints and Relational Database
Schemas
464
8.1.55 Update Operations, Transactions, and Dealing with
Constraint Violations
470
8.1.56 Basic SQL 473
8.1.57 SQL Data Definition and Data Types 474
8.1.58 Specifying Constraints in SQL 478
Sample Copy. Not For Distribution.
xiv
8.1.59 Basic Retrieval Queries in SQL 480
8.1.60 INSERT, DELETE, and UPDATE Statements in SQL 488
8.1.61 Additional Features of SQL 490
8.1.62 More SQL: Complex Queries, Triggers, Views, and
Schema Modification
491
8.1.63 More Complex SQL Retrieval Queries 491
8.1.64 Specifying Constraints as Assertions and Actions as
Triggers
503
8.1.65 Views (Virtual Tables) in SQL 505
Sample Copy. Not For Distribution.
Advanced Concepts of Information Technology
Page 1 of 520
Part-I
1.1. File Structures Indexing and Hashing
1.1.1. Disk Storage, Basic File Structures, and Hashing
Databases are stored physically as files of records, which are typically stored on magnetic disks.
This chapter and the next deal with the organization of databases in storage and the tech niques for
accessing them efficiently using various algorithms, some of which require auxiliary data structures
called indexes. These structures are often referred to as physical database file structures, and are
at the physical level of the three-schema architecture described in Chapter 2. We start in Section
17.1 by introducing the concepts of computer storage hierarchies and how they are used in database
systems. Section 17.2 is devoted to a description of magnetic disk storage devices and their
characteristics, and we also briefly describe magnetic tape storage devices. After discussing
different storage technologies, we turn our attention to the methods for physically organizing data
on disks. Section 17.3 covers the technique of double buffering, which is used to speed retrieval of
multiple disk blocks. In Section we discuss various ways of formatting and storing file records on
disk. Section discusses the various types of operations that are typically applied to file records. We
present three primary methods for organizing file records on disk: unordered records, in Section
17.6; ordered records, in Section 17.7; and hashed records, in Section 17.8.
Section 17.9 briefly introduces files of mixed records and other primary methods for organizing
records, such as B-trees. These are particularly relevant for storage of object-oriented databases,
which we discussed in Chapter 11. Section 17.10 describes RAID (Redundant Arrays of
Inexpensive (or Independent) Disks)—a data storage system architecture that is commonly used in
large organizations for better reliability and performance. Finally, in Section 17.11 we describe
three developments in the storage systems area: storage area networks (SAN), network attached
storage (NAS), and iSCSI (Internet SCSI—Small Computer System Interface), the latest
technology, which makes storage area networks more afford-able without the use of the Fiber
Channel infrastructure and hence is getting very wide acceptance in industry. Section 17.12
summarizes the chapter. In Chapter 18 we discuss techniques for creating auxiliary data structures,
called indexes, which speed up the search for and retrieval of records. These techniques involve
storage of auxiliary data, called index files, in addition to the file records themselves.
Chapters 17 and 18 may be browsed through or even omitted by readers who have already studied
file organizations and indexing in a separate course. The material covered here, in particular
Sections 17.1 through 17.8, is necessary for understand-ing Chapters 19 and 20, which deal with
query processing and optimization, and database tuning for improving performance of queries.
Introduction
The collection of data that makes up a computerized database must be stored phys-ically on some
computer storage medium. The DBMS software can then retrieve, update, and process this data
as needed. Computer storage media form a storage hierarchy that includes two main categories:
Primary storage. This category includes storage media that can be operated on directly by the
computer’s central processing unit (CPU), such as the com-puter’s main memory and smaller but
faster cache memories. Primary stor-age usually provides fast access to data but is of limited storage
capacity. Although main memory capacities have been growing rapidly in recent years, they are
still more expensive and have less storage capacity than sec-ondary and tertiary storage devices.
Secondary and tertiary storage. This category includes magnetic disks, optical disks (CD-
ROMs, DVDs, and other similar storage media), and tapes. Hard-disk drives are classified as
secondary storage, whereas removable media such as optical disks and tapes are considered tertiary
storage. These devices usually have a larger capacity, cost less, and provide slower access to data
Sample Copy. Not For Distribution.
Advanced Concepts of Information Technology
Page 2 of 520
than do primary storage devices. Data in secondary or tertiary storage cannot be processed directly
by the CPU; first it must be copied into primary storage and then processed by the CPU.
We first give an overview of the various storage devices used for primary and secondary storage in
Section 17.1.1 and then discuss how databases are typically handled in the storage hierarchy in
Section 17.1.2.
1. Memory Hierarchies and Storage Devices
In a modern computer system, data resides and is transported throughout a hierarchy of storage
media. The highest-speed memory is the most expensive and is there-fore available with the least
capacity. The lowest-speed memory is offline tape storage, which is essentially available in
indefinite storage capacity.
At the primary storage level, the memory hierarchy includes at the most expensive end, cache
memory, which is a static RAM (Random Access Memory). Cache memory is typically used by
the CPU to speed up execution of program instructions using techniques such as prefetching and
pipelining. The next level of primary stor-age is DRAM (Dynamic RAM), which provides the main
work area for the CPU for keeping program instructions and data. It is popularly called main
memory. The advantage of DRAM is its low cost, which continues to decrease; the drawback is
its volatility and lower speed compared with static RAM. At the secondary and tertiary storage
level,the hierarchy includes magnetic disks, as well as mass storage in the form of CD-ROM
(Compact Disk–Read-Only Memory) and DVD (Digital Video Disk or Digital Versatile Disk)
devices, and finally tapes at the least expensive end of the hierarchy. The storage capacity is
measured in kilobytes (Kbyte or 1000 bytes), megabytes (MB or 1 million bytes), gigabytes (GB
or 1 billion bytes), and even terabytes (1000 GB). The word petabyte (1000 terabytes or 10**15
bytes) is now becoming relevant in the context of very large repositories of data in physics,
astronomy, earth sciences, and other scientific applications.
Programs reside and execute in DRAM. Generally, large permanent databases reside on secondary
storage, (magnetic disks), and portions of the database are read into and written from buffers in
main memory as needed. Nowadays, personal computers and workstations have large main
memories of hundreds of megabytes of RAM and DRAM, so it is becoming possible to load a large
part of the database into main memory. Eight to 16 GB of main memory on a single server is
becoming common-place. In some cases, entire databases can be kept in main memory (with a
backup copy on magnetic disk), leading to main memory databases; these are particularly useful
in real-time applications that require extremely fast response times. An example is telephone
switching applications, which store databases that contain routing and line information in main
memory.
Between DRAM and magnetic disk storage, another form of memory, flash memory, is becoming
common, particularly because it is nonvolatile. Flash memories are high-density, high-performance
memories using EEPROM (Electrically Erasable Programmable Read-Only Memory) technology.
The advantage of flash memory is the fast access speed; the disadvantage is that an entire block
must be erased and written over simultaneously. Flash memory cards are appearing as the data
storage medium in appliances with capacities ranging from a few megabytes to a few gigabytes.
These are appearing in cameras, MP3 players, cell phones, PDAs, and so on. USB (Universal Serial
Bus) flash drives have become the most portable medium for carrying data between personal
computers; they have a flash memory storage device integrated with a USB interface.
CD-ROM (Compact Disk – Read Only Memory) disks store data optically and are read by a laser.
CD-ROMs contain prerecorded data that cannot be overwritten. WORM (Write-Once-Read-Many)
disks are a form of optical storage used for archiving data; they allow data to be written once and
read any number of times without the possibility of erasing. They hold about half a gigabyte of data
per disk and last much longer than magnetic disks. Optical jukebox memories use an array of CD-
ROM platters, which are loaded onto drives on demand. Although optical jukeboxes have capacities
in the hundreds of gigabytes, their retrieval times are in the hundreds of milliseconds, quite a bit
slower than magnetic disks. This type of storage is continuing to decline because of the rapid
Sample Copy. Not For Distribution.
Advanced Concepts of Information Technology
Page 3 of 520
decrease in cost and increase in capacities of magnetic disks. The DVD is another standard for
optical disks allowing 4.5 to 15 GB of storage per disk. Most personal computer disk drives now
read CD-ROM and DVD disks. Typically, drives are CD-R (Compact Disk Recordable) that can
create CD-ROMs and audio CDs (Compact Disks), as well as record on DVDs.
Finally, magnetic tapes are used for archiving and backup storage of data. Tape jukeboxes—
which contain a bank of tapes that are catalogued and can be automat-ically loaded onto tape
drives—are becoming popular as tertiary storage to hold terabytes of data. For example, NASA’s
EOS (Earth Observation Satellite) system stores archived databases in this fashion.
Many large organizations are already finding it normal to have terabyte-sized data-bases. The
termvery large database can no longer be precisely defined because disk storage capacities are on
the rise and costs are declining. Very soon the term may be reserved for databases containing tens
of terabytes.
2. Storage of Databases
Databases typically store large amounts of data that must persist over long periods of time, and
hence is often referred to as persistent data. Parts of this data are accessed and processed
repeatedly during this period. This contrasts with the notion of transient data that persist for only
a limited time during program execution. Most databases are stored permanently (or persistently)
on magnetic disk secondary storage, for the following reasons:
Generally, databases are too large to fit entirely in main memory.
The circumstances that cause permanent loss of stored data arise less frequently for disk
secondary storage than for primary storage. Hence, we refer to disk—and other secondary storage
devices—asnonvolatile storage, whereas main memory is often called volatile storage.
The cost of storage per unit of data is an order of magnitude less for disk secondary storage
than for primary storage.
Some of the newer technologies—such as optical disks, DVDs, and tape juke-boxes—are likely to
provide viable alternatives to the use of magnetic disks. In the future, databases may therefore
reside at different levels of the memory hierarchy from those described in Section 17.1.1. However,
it is anticipated that magnetic disks will continue to be the primary medium of choice for large
databases for years to come. Hence, it is important to study and understand the properties and
characteristics of magnetic disks and the way data files can be organized on disk in order to design
effective databases with acceptable performance.
Magnetic tapes are frequently used as a storage medium for backing up databases because storage
on tape costs even less than storage on disk. However, access to data on tape is quite slow. Data
stored on tapes is offline; that is, some intervention by an operator—or an automatic loading
device—to load a tape is needed before the data becomes available. In contrast, disks
are online devices that can be accessed directly at any time.
The techniques used to store large amounts of structured data on disk are important for database
designers, the DBA, and implementers of a DBMS. Database designers and the DBA must know
the advantages and disadvantages of each storage technique when they design, implement, and
operate a database on a specific DBMS. Usually, the DBMS has several options available for
organizing the data. The process of physical database design involves choosing the particular data
organization techniques that best suit the given application requirements from among the options.
DBMS system implementers must study data organization techniques so that they can implement
them efficiently and thus provide the DBA and users of the DBMS with sufficient options.
Typical database applications need only a small portion of the database at a time for processing.
Whenever a certain portion of the data is needed, it must be located on disk, copied to main memory
for processing, and then rewritten to the disk if the data is changed. The data stored on disk is
organized as files of records. Each record is a collection of data values that can be interpreted as
facts about entities, their attributes, and their relationships. Records should be stored on disk in a
manner that makes it possible to locate them efficiently when they are needed.
Sample Copy. Not For Distribution.
Get Complete Book At Educreation Store
www.educreation.in
Sample Copy. Not For Distribution.
Sample Copy. Not For Distribution.