database management assignment

Upload: guru-vashist

Post on 02-Mar-2016

44 views

Category:

Documents


0 download

DESCRIPTION

assignment

TRANSCRIPT

ASSIGNMENT

ASSIGNMENT

DRIVE FALL2013

PROGRAMMBA IT

SUBJECT CODE & NAMEMI0034- DATABASE MANAGEMENT SYSTEMS

SEMESTER3

BK IDB1217

CREDITS4

MARKS60

Q.1 How is DBMS classified based on several criteria? Explain each one of them with few examples where ever required

Ans : Several criteria are normally used to classify DBMSs. These are discussed below:

1. Based on data model

2. Based on the number of users

3. Based on the ways database is distributed

1. Based on data model:

It specifies a particular mechanism for data storage and retrieval. The primary difference between the different database models lies in the methods of expressing relationships and constraints among the data elements. Five database models are discussed here:

1. Hierarchical Model: It is one of the oldest database models [1950s], and represents data as hierarchical tree structures.

2. Network Model: It represents data as record types, and has an ability to handle many-to-many relationships.

3. Relational Model: Relational models stores data in the form of a table. Data is interrelated; relationships link rows from two tables. End-users need not know about physical data storage details. So it is conceptually simple.

4. Object Oriented model: It is based on a collection of objects. Object oriented database manages objects, and is suited for multimedia applications as well as data with complex relationships that are difficult to model and process in a relational DBMS. Object Oriented Data Base Management System [OODBMS] holds data, text, pictures, voice and video.

5. Object-relational model: It is a combination of both objects oriented concepts and relational concepts. It combines the advantages of modern object oriented programming languages, which provide facility for the users to define new data type and functions of their own. Relational database is not only useful for storing data, but also provides business rules that are applied to the data. Associating rules with data makes the data more active, enabling the database system to perform automatic validity checks to automate many business procedures. It supports specialized applications such as image retrieval, searching, multimedia, etc.

Eg. IBM's DB2 universal server, oracle 8 and SQL server 7 and so on.

B. Based on number of users: It is based on number of users supported by the system. Single user system supports only one user at a time and multi-user system supports multiple users concurrently.

C. Based on number of sites: A DBMS is centralized if the data is stored at a single computer site. A DBMS is distributed if the data and DBMS software are distributed over many sites, connected by a computer network.

Q.2 Differentiate between B+ tree and B- tree. Explain them with diagrams

Ans: The main disadvantage of the index-sequential file organization is that performance degrades as the file grows. A B+-tree index takes the form of a balanced tree in which every path from the root of the tree to a leaf of the tree is of the same length.

In a B- tree every value of the search field appears once at some level in the tree, along with a data pointer [may be in internal nodes also]. In a B+-tree, data pointers [address of a particular search value] are stored only at the leaf nodes of the tree; hence, the structure of leaf nodes differs from the structure of internal nodes. The leaf nodes have an entry for every value of the search field, along with a data pointer to the record.

A B+ tree is a multilevel index, but it has got different a structure. A typical node of the B+ tree contains upto n-1 search key values such as k1,k2.n-1 and n pointers p1,p2..pn. The search key values within a node are kept in sorted order, ki < kj.

The number of pointers in a node is called the fan out of the node. The structure of a non-leaf node is the same as leaf nodes, except that all pointers are pointers to tree nodes.

Each internal node is of the form >p1, k1,p2,k2.pq-1, kq-1, pq>

The root node has at least 2 tree pointers.

Each leaf node is of the form

each pri is a data pointer, and pnext points to the next leaf node of the B+ tree

All leaf nodes are at the same level.

The main difference is that a B tree eliminates the duplicate storage of search key values. In the B+ tree every search key value appears in some leaf nodes and several are repeated in non-leaf nodes. AB tree allows search key values to appear only once, hence B tree requires fewer nodes. However, since the search-keys appear only once, in a B tree, every node contains search value along with an address of that value [pointer point either to file records or to buckets that contain the search value].

Consider the top node, which consists of two value entries [5 and 8] and three pointers. Values less than 5 or equal to 5 are placed in the left lower node, similarly values greater than 5 and less than 8 are placed in the middle node, and greater than 8 are placed in the right lower node.

Consider an algorithm of B-trees.

Set N to the top node

Let X, Y be the data values in node N [xSelect job, count (*) from emp group by job having count (*)>20;

2>Select Deptno,max(basic), min(basic)from emp group by Detno having salary>30000

find the average salary of only department 1.

( SELECT DnO,avg(salary)

FROM Employee

GROUP BY Dno

HAVING Dno = 1;

Here where_clause limits the tuples to which functions are applied, the having clause is used to select individual groups of tuples.

( For each department that has more than three employees, retrieve the department number and the number of its employees, who are earning more than 10,000.

( Example: SELECT Dno, AVG (salary)

FROM Employee

WHERE Bdate LIKE '%jan%'

GROUP BYDno

HAVING max (salary) > 10000;

( OUT PUT: DNO AVG (SALARY)

1 22000

2 18000

3 20000

Q.4 What are the problems and failures that may encounter with respect to the transactions in a database management system? Give examples.

Ans : A transaction is a logical unit of work, which involves may database operations. A transaction is a collection of operations that forms a single logical unit of work. A transaction is a unit of program executions that accesses and possibly updates various databases. Example: Banking system, Student database performs transactions.

Problems with code :

1. The lost update problem: Suppose transactions T1 and T2 are submitted at the same time, when these two transactions are executed concurrently as shown in fig. a, then the final value of x is incorrect. Because T2 reads the value of x before T1 changes it in the database, and hence the updated value resulting from T1 is lost.

For e.g.: x=80 at the start (80 reservation at the beginning), n=5 (T1 transfers 5 seat reservation from the flight x to y), and m=4 (T2 reserves 4 seats on x), the final result should be x=79 but due to interleaving of operations x=84, because updating T1 that removed the 5 seats from x was lost.

2. Dirty read problem: This problem occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another transaction before it is changed back to its original value.

For e.g.: T1 updates item x and then fails before completion, so the system must change x back to original value. Before it can do so, however, transaction T2 reads the temporary value of x, which will not be recorded permanently in the database, because of the failure of T1. The value of item x that is read by T2 is called Dirty Data, because it has been created by a transaction that has not been completed and committed yet. Hence this problem is also known as the temporary update problem.

3. Incorrect Summary Problem: If one transaction is calculating an aggregate summary function on a number of records, while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated.

For ex: Transaction T3 is calculating the total no. of reservations on all the flights, meanwhile transaction T1 is executing. The T3 reads the values of x after n seats have been subtracted from it, but reads the value of y before those n seats have been added to it.

Types of failures: 1. A computer failure (System Crash): Hardware, software, network error occurs in the computer system during transaction

2. Transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by 'Zero' etc.

3. Local errors or exception conditions detected by the transaction: During transaction execution, certain conditions may occur that perform cancellation of the transaction. For ex. Data for the transaction my not be found.

4. Concurrency control enforcement: The concurrency control method may decide to abort the transactions, to be restarted later, because several transactions are in a state of deadlock.

5. Disk failure: Some disk blocks may lose their data because of read or write malfunctions

6. Physical problems and catastrophes: This refers to a list of problems that includes power or air conditioning failure, fire, theft, overwriting disks etc.

Q.5 Consider any database of your choice (may be simple banking database/forecasting database/project management database). Show the deduction of the tables in your database to the different types of normal forms

Ans : .5 normal forms with respect to the database

Normal forms Based on Primary Keys A relation schema R is in first normal form if every attribute of R takes only single atomic values. To transform the un-normalized table (a table that contains one or more repeating groups) to first normal form, we identify and remove the repeating groups within the table

E.g. This is not in first normal form because D. location is not an atomic attribute. The domain of D location contains multivalues.Dept. D.Name D.No D. location

R&D 5 [England, London, Delhi)

HRD 4 Bangalore

Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, in R, (X->Y) if and only if each value of X is associated with one value of Y. X is called the determinant set and Y is the dependant attribute.Second Normal Form (2 NF) A second normal form is based on the concept of full functional dependency. A relation is in second normal form if every non-prime attribute A in R is fully functionally dependent on the Primary Key of R.

A Partial functional dependency is a functional dependency in which one or more non-key attributes are functionally dependent on part of the primary key. It creates a redundancy in that relation, which results in anomalies when the table is updated.

Third Normal Form (3NF) This is based on the concept of transitive dependency. We should design relational schema in such a way that there should not be any transitive dependencies, because they lead to update anomalies. A functional dependence [FD] x->y in a relation schema 'R' is a transitive dependency. If there is a set of attributes 'Z' Le x->, z->y is transitive. The dependency SSN->Dmgr is transitive through Dnum in Emp_dept relation because SSN->Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset [part] of the key.

Boyce Codd Normal Form (BCNF) Database relations are designed so that they are neither partial dependencies nor transitive dependencies, because these types of dependencies result in update anomalies. functional dependency describes the relationship between attributes in a relation. For example, 'A' and 'B' are attributes in relation R. 'B' is functionally dependent on 'A' (A B) if each value of 'A' is associated with exactly one value of 'B'. The left_hand side and the right_hand side functional dependency are sometimes called the determinant and dependent respectively.

A relation is in BCNF if and only if every determinant is a Candidate key.

The difference between the third normal form and BCNF is that for a functional dependency A B, the third normal form allows this dependency in a relation if 'B' is a primary_key attribute and 'A' is not a Cndidate key.

Where as in BCNF. 'A' must be Candidate Key. Therefore BCNF is a stronger form of the third normal form.

PRODUCT (prd#,prdname,price)

Prd#->prodname,price

CUSTOMER (cust#,custname,custaddr)

Cust#->custname,custaddr

ORDER (ord#,cust#mord#,qty,amt)

Ord#->qty,amt

The PRODUCT scheme is in BCNF. Since the prd# is a candidate key, similarly customer schema is also in BCNF.

The schema ORDER, however is not in BCNF, because ord# is not a super key for ORDER, i.e. we could have a pair of tuples representing a single ord#.

Fourth Normal Form (4NF) Multi valued dependencies are based on the concept of first normal form, which prohibits attributes having a set of values. If we have two or more multi valued independent attributes in the same relation, we get into a situation where we have to repeat every value of one of the attributes, with every value of the other attributes to keep the relation state consistent, and to maintain independence among the attributes involved. This constraint is specified by a Multi valued dependency.

Employee

Decomposed relation to reduce redundancy NAME PROJECT HOBBY

A Microsoft Cricket

A Oracle Music

A Microsoft Music

A Oracle Cricket

B INTEL Movies

B Sybase Reading

B INTEL Reading

B Sybase Movies

NAME PROJECT NAME PROJECT

A Cricket A Microsoft

A Music A Oracle

B Movie B Intel

B Reading B Sybase

Fourth Normal Form (4NF): The definition of 4NF is violated when a relation has undesirable multivolume dependencies, and hence identify such relations and decompose into 4NF relations.

Alternate definition: A B that holds over R, one of((relation R is said to be in 4NF if for every MVD A the following is true:

B A (trivial), or

AB = R or

A is a super key

The Employee relation is not in 4NF because of the non-trivial MVDs (project and hobby attributes of employee relation are independent of each other) and NAME is not a super key of EMPLOYEE. To make this relation into 4NF you have to decompose EMPLOYEE to PROJECT AND HOBBY.

Q.6 Read the following case study thoroughly and answer the following questions:

Laxmi bank is one of the largest private sector banks of India. It has an extensive network of more than 200 branches. It offers banking services to retail as well as corporate clients. The bank faced a challenge in integrating multi-pronged database management system into a centralized system. The IT department of the bank also realized that the computing capabilities of its PCs and servers were not proportionately distributed among all its branches. Each branch had its database management system stored in a traditional way on the disk. The total cost of operating and maintaining the current IT infrastructure was very high and the fundamental shortcomings added to the costs. Moreover, there were also recurrent problems due to the malfunctioning of the currently operational database management system. Therefore, the banks top management decided to fix the problem and operationalise a robust database management system. The bank hired an external database technology consulting firm called AKPY Info systems Limited. AKPY divided the entire IT infrastructure of the bank around two verticals. The retail banking vertical and the corporate banking vertical. All the individual database servers from the individual branches were removed. The entire database system was made virtual such that the managers and the staff can access only the required information (related to retail banking or corporate banking) from the respective centralised data centers. There were only two such centralised data centers (one for retail banking and another for corporate banking) that were managed centrally. Staff and managers could access the information through their PCs and laptops. Centralised database management system complemented the security system by bringing in authentication through a unified ID management server. Managers and officers of the bank were able to process half a million transactions per month in real time after the new implementation. There were significant savings in the cost and also in the consumption of power. Now there were no problems with regard to imbalances in the load across various network servers. Due to centralised data management, top management could keep an eye on the functioning of various branches. Hence the cases of fraud and cheating reduced considerably. The bank managers could also process the loan applications in reduced time since the customers previous records could be accessed at the click of the button and approval from the higher authorities could be obtained in real time. Moreover the new system also brought in many applications that helped local managers in the decision making process.

a. List the uses of centralized data management

b. What steps Laxmi bank need to take if it were to change its centralised database system to a distributed database system in future?

Ans : a. uses of centralized data management :

From the above case study it is concluded that centralized data management has following advantages which has made it more useful than the older system :

1. Centralised database management system complements the security system by bringing in authentication through a unified ID management server

2 If data is stored and managed in various locations, as the volume of data increases, time and effort, as well as necessary devices to manage all the data must be increased accordingly. If data is gathered at one location and centrally managed, a large space and time can be saved.

3 Data availability is not efficient.

4 The failure of central database will lead to loss of whole database.

5 Due to the complexity in the design, it requires significant procedures, very experienced DBAs and systems people.

6 It will not support addition of new nodes due to the frequency.

7 It is easier to manage.

b. Data fragmentation

Data fragmentation occurs when a collection of data in memory is broken up into many pieces that are not close together. It is typically the result of attempting to insert a large object into storage that has already suffered external fragmentation.

For example, files in a file system are usually managed in units called blocks or clusters. When a file system is created, there is free space to store file blocks together contiguously. This allows for rapid sequential file reads and writes. However, as files are added, removed, and changed in size, the free space becomes externally fragmented, leaving only small holes in which to place new data. When a new file is written, or when an existing file is extended, the operating system puts the new data in new non-contiguous data blocks to fit into the available holes. The new data blocks are necessarily scattered, slowing access due to seek time and rotational latency of the read/write head, and incurring additional overhead to manage additional locations. This is called file system fragmentation.

Data replication

Active (real-time) storage replication is usually implemented by distributing updates of a block device to several physical hard disks. This way, any file system supported by the operating system can be replicated without modification, as the file system code works on a level above the block device driver layer. It is implemented either in hardware (in a disk array controller) or in software (in a device driver).

The most basic method is disk mirroring, typical for locally-connected disks. The storage industry narrows the definitions, so mirroring is a local (short-distance) operation. A replication is extendable across a computer network, so the disks can be located in physically distant locations, and the master-slave database replication model is usually applied. The purpose of replication is to prevent damage from failures or disasters that may occur in one location, or in case such events do occur, improve the ability to recover. For replication, latency is the key factor because it determines either how far apart the sites can be or the type of replication that can be employed.

Data allocation Data allocation may be decided by using computer programs applied to a specific domain to automatically and dynamically distribute resources to applicants.

This is especially common in electronic devices dedicated to routing and communication. For example, channel allocation in wireless communication may be decided by a base transceiver station using an appropriate algorithm.

c. yes. It is possible to replicate the centralized database management model of the bank in a manufacturing concern. As the code can be used again and again with some smaller changes or modifications. It is one of the benefits of central data management that a source code can be used again and again known as Software reuse.