mc0077 smu 2013 fall session

9
(Cover pg) ASSIGNMENT- FALL 2013 Name: __Narinder Kumar_________________________ Registration No: __511225739_____________________________ Learning Center: __Artex Informatic_________________________ Learning Center Code: __1688__________________________________ Course: __MCA__________________________________ Subject: __ADVANCED DATABASE SYSTEMS________ Semester: __IV____________________________________ Subject Code: __MC0077_______________________________ Date of submission: __30 November 2013_______________________ Marks awarded: ________________________________________ Directorate of Distance Education

Upload: narinder-kumar

Post on 26-Jun-2015

757 views

Category:

Education


2 download

TRANSCRIPT

Page 1: MC0077 SMU 2013 Fall Session

(Cover pg)

ASSIGNMENT- FALL 2013

Name: __Narinder Kumar_________________________

Registration No: __511225739_____________________________

Learning Center: __Artex Informatic_________________________

Learning Center Code: __1688__________________________________

Course: __MCA__________________________________

Subject: __ADVANCED DATABASE SYSTEMS________

Semester: __IV____________________________________

Subject Code: __MC0077_______________________________

Date of submission: __30 November 2013_______________________

Marks awarded: ________________________________________

Directorate of Distance EducationSikkim Manipal UniversityII Floor, Syndicate House

Manipal – 576 104

Signature of Coordinator Signature of Center Signature of Evaluator

Q1:- Write short notes on:

Page 2: MC0077 SMU 2013 Fall Session

a) First Normal Formb) Second Normal Formc) Third Normal Formd) Boyce-Codd Normal Forme) Fourth Normal Form

Ans:- a) First Normal Form: Any table having any relation is said to be in the first normal form. The criteria that must be met to be considered relational is that the cells of the table must contain only single values, and repeat groups or arrays are not allowed as values. All attributes (the entries in a column) must be of the same kind, and each column must have a unique name. Each row in the table must be unique. Databases in the first normal form are the weakest and suffer from all modification anomalies.b) Second Normal Form: If all a relational database's non-key attributes are dependent on the entire key, then the database is considered to meet the criteria for being in the second normal form. This normal form solves the problem of partial dependencies, but this normal form only pertains to relations with composite keys.c) Third Normal Form: A database is in the third normal form if it meets the criteria for a second normal form and has no transitive dependencies.D) Boyce-Codd Normal Form: A database that meets third normal form criteria and every determinant in the database is a candidate key, it's said to be in the Boyce-Codd Normal Form. This normal form solves the issue of functional dependencies.e) Fourth Normal Form: Fourth Normal Form (4NF) is an extension of BCNF for functional and multi-valued dependencies. A schema is in 4NF if the left hand side of every non-trivial functional or multi-valued dependency is a super-key.

Q2:- Discuss various Join Strategies for Parallel Processors includingParallel-Join and Pipelined Multi-way JoinAns:- Join Strategies for Parallel Processors: In many cases, multiple processors may be available for parallel computation of the join. There is much architecture including database machines. We consider only a simple architecture:

All processors have access to all disks, and �

All processors share main memory.�

Parallel – Join:

1. Parallel - Join: split the pairs to be tested over several processors. Each processor computes part of the join, and then the results are assembled (merged).

2. Ideally, the overall work of computing join is partitioned evenly over all processors. If such a split is achieved without any overhead, a parallel join using N processors will take 1/N times as long as the same join would take on a single processor.

3. In practice, the speedup is less dramatic because

a. Overhead is incurred in partitioning the work among the processors. b. Overhead is incurred in collecting the results computed by each

processor.

Page 3: MC0077 SMU 2013 Fall Session

c. If the split is not even, the final result cannot be obtained until the last processor has finished.

d. The processors may compete for shared system resources, e.g., for A

B (e.g., deposit customer), if each processor uses its own partition of A, and the main memory cannot hold the entire B, the processors need to synchronize the access of B so as to reduce the number of times that each block of B must be read in from disk.

4. A parallel hash algorithm to reduce memory contention.

Pipelined Multi-way Join:

1. Computing several joins in parallel.

Example. Can be computed by first computing

� '' and `` '', and then `` ''.

2. Moreover, it can be computed in pipelined way: .

Processor is assigned to process , to , and to process

the join of the tuples being generated by and .

Q3:- Explain the following concepts:a) Entity Typesb) Relationship Typesc) Attribute Typesd) Domain Specificatione) Cardinality Specifications

Ans:- a) Entity Types: Entity Types represent the "objects" of interest in the information system about which data is recorded. Entity type names are most commonly nouns: person, teacher, student, report, exam, product, etc.

b) Relationship Types: connect entity types in one of two structures:

1. Associative relationships connect two or more 'independent' entity types, while

2. Classification hierarchies, specify a specialization/generalization structure of entity types.

The Teaches/Teaches By relationship is a binary associative relationship connecting the 2 entity types Teacher and Course, while Writes/Written By is a ternary associative relationship that connects Student, Course, and Report. Note that each occurrence of the Writes relationship records a connection between occurrences of each of the 3 participating entity types.

c) Attribute Types: are used to describe important characteristics of an entity or relationship. The attribute types supported in SSM are shown in Figure 4.3b and include:

Page 4: MC0077 SMU 2013 Fall Session

1. Atomic/single valued and not decomposable characteristics, examples: Id �and birthdate,

2. Composite attributes, examples: name and address, �

3. Multi-valued attributes, examples: telephone and address, �

4. Derived, i.e. attributes having a calculated value at storage or retrieval �time, example: age

5. Media attributes examples: geo-loc, CV and picture. �

d) Domain specification: for each attribute should be included in an SSM model, as shown in Figure 4.3b. Domain types may be:

1. Traditional R-DB data-types as integer, character, decimal, date, money, ..., 2. A user defined type, such as PersonNr and Pcode,

3. Specialized binary large objects, BLOBs, such as image, text, video, audio,

4. Geographic points, lines, or polygons, or

5. Results from user defined functions.

e) Cardinality Specifications: should be assigned for each classification and associative relationship, as well as for multi-valued attributes. In SSM, a (min, max) cardinality specification is used. The participation specification indicates total (t) or partial (p) membership, i.e. whether each parent instance must participate in one or more of the subclass entity-types or can exist without participation in any of the subclasses.

Q4:- Explain about Attribute-Based Image Retrieval and Text-BasedImage Retrieval.Ans:- Attribute-Based Image Retrieval: Attribute-based image queries can be formulated as standard SLQ3 queries. However, there may be a problem with presentation of the image in the result set since image presentation assumes that the query processor can distinguish between image formats and present the image correctly. For example, a query to retrieve 4th of July pictures taken by Joan Nordbotten could be expressed as:

Select * from image table where Date taken = "July 4" and Name='Joan Nordbotten';

This query assumes that the SQL3/Image query processor can:

a) Transform the verbal date form to an internal representation, b) Concatenate structured attributes: Name. First, Name. Last,

c) Travers the link from Image_table.creator to the creator.name attribute, and

d) Output image (blob) data.

Page 5: MC0077 SMU 2013 Fall Session

If not, than relatively simple UDF's can be defined for the DB image table to perform these functions.

Text - Based Image Retrieval: Often, the image requester is able to give a verbal description/specification of the content of the required images, for example images showing These text-based queries can be formulated as free-text or a term list that can be compared to such text descriptors as description, subjects, title and/or the text surrounding an embedded image, using the text retrieval techniques

Q5:- What are various Data Mining Techniques? Explain any fiveAns:- Data Mining Techniques:

A) Cluster Analysis: In an unsupervised learning environment the system has to discover its own classes and one way in which it does this is to cluster the data in the database as shown in the following diagram. The first step is to discover subsets of related objects and then find descriptions e.eg D1, D2, D3 etc. which describe each of these subsets.

B) Induction: A database is a store of information but more important is the information which can be inferred from it. There are two main inference techniques available i.e. deduction and induction.

C) Decision Trees: Decision Trees are simple knowledge representation and they classify examples to a finite number of classes, the nodes are labeled with attribute names, the edges are labeled with possible values for this attribute and the leaves labeled with different classes. Objects are classified by following a path down the tree, by taking the edges, corresponding to the values of the attributes in an object.

D) Rule Induction: A Data Mining System has to infer a model from the database that is it may define classes such that the database contains one or more attributes that denote the class of a tuple i.e. the predicted attributes while the remaining attributes are the predicting attributes. A Class can then be defined by condition on the attributes. When the classes are defined the system should be able to infer the rules that govern classification, in other words the system should find the description of each class.

E) Neural Networks: Neural Networks are an approach to computing that involves developing mathematical structures with the ability to learn. The methods are the result of academic investigations to model nervous system learning. Neural Networks have the remarkable ability to derive meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained Neural Network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.

Q6:- What is the signification of Data Replication? When replicationshould be used?Ans:- Data Replication: Replication is the process of copying and maintaining database objects, such as tables, in multiple databases that make up a distributed database system. Changes applied at one site are captured and stored locally before

Page 6: MC0077 SMU 2013 Fall Session

being forwarded and applied at each of the remote locations. Advanced Replication is a fully integrated feature of the Oracle server; it is not a separate server.Replication uses distributed database technology to share data between multiple sites, but a replicated database and a distributed database are not the same. In a distributed database, data is available at many locations, but a particular table resides at only one location.Replication should be used:Availability Replication provides fast, local access to shared data because it balances activity over multiple sites. Some users can access one server while other users access different servers, thereby reducing the load at all servers. Also, users can access data from the replication site that has the lowest access cost, which is typically the site that is geographically closest to them.Performance Replication provides fast, local access to shared data because it balances activity over multiple sites. Some users can access one server while other users access different servers, thereby reducing the load at all servers. Also, users can access data from the replication site that has the lowest access cost, which is typically the site that is geographically closest to them.Disconnected Computing: A Materialized View is a complete or partial copy (replica) of a target table from a single point in time. Materialized views enable users to work on a subset of a database while disconnected from the central database server. Later, when a connection is established, users can synchronize (refresh) materialized views on demand. When users refresh materialized views, they update the central database with all of their changes, and they receive any changes that may have happened while they were disconnected.Network Load Reduction Replication can be used to distribute data over multiple regional locations. Then, applications can access various regional servers instead of accessing one central server. This configuration can reduce network load dramatically.Mass Deployment Replication can be used to distribute data over multiple regional locations. Then, applications can access various regional servers instead of accessing one central server. This configuration can reduce network load dramatically.