csit seventh semester dbms old question answer

Upload: damodar-lohani

Post on 13-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

Old question solutions of CSIT seventh semester ADBMS

TRANSCRIPT

  • Q.WhataredifferencesinCentralizedandDistributedDatabaseSystems?Listtherelativeadvantagesofdatadistribution?Adistributeddatabaseisadatabasethatisunderthecontrolofacentraldatabasemanagementsystem(DBMS)inwhichstoragedevicesarenotallattachedtoacommonCPU.Itmaybestoredinmultiplecomputerslocatedinthesamephysicallocation,ormaybedispersedoveranetworkofinterconnectedcomputers.Collectionsofdata(e.g.inadatabase)canbedistributedacrossmultiplephysicallocations.AdistributeddatabasecanresideonnetworkserversontheInternet,oncorporateintranetsorextranets,oronothercompanynetworks.Thereplicationanddistributionofdatabasesimprovesdatabaseperformanceatenduserworksites.Toensurethatthedistributivedatabasesareuptodateandcurrent,therearetwoprocesses:

    Replication System maintains multiple copies of data, stored in different sites, for

    fasterretrievalandfaulttolerance. A relation or fragment of a relation is replicated if it is stored

    redundantlyintwoormoresites. Full replication of a relation is the case where the relation is stored at all

    sites. Fully redundant databases are those in which every site contains a copy

    oftheentiredatabase.

    Fragmentation Relationispartitionedintoseveralfragmentsstoredindistinctsites Division of relation r into fragments r1, r2, , rn which contain

    sufficientinformationtoreconstructrelationrMajorfeaturesofaDDBare:

    Datastoredatanumberofsites,eachsitelogicallysingleprocessor Sitesareinterconnectedbyanetworkratherthanamultiprocessor

    configuration DDBislogicallyasingledatabase(althougheachsiteisadatabasesite) DDBMShasfullfunctionalityofaDBMS

    [email protected]

  • Totheuser,thedistributeddatabasesystemshouldappearexactlylikeanondistributeddatabasesystem.

    Advantagesofdistributeddatabasesystemsare:

    DatasharingandDistributedControl:The primary advantage to accomplishing data sharing by means of data distribution is that each site is able to retain a degree of control over data stored locally. In a centralized system, the database administrator of the central site controls the database. In a distributed system, there is a global database administrator responsible for the entire system. A part of these responsibilities is delegated to the local database administrator for each site. Depending upon the design of the distributed database system, each local administrator may have a different degree of autonomy which is often a majoradvantageofdistributeddatabases.

    ReliabilityandAvailability:If one site fails in distributed system, the remaining sites may be able to continue operating. In particular, if data are replicated in several sites, transaction needing a particular data item may find it in several sites. Thus, thefailureofasitedoesnotnecessarilyimplytheshutdownofthesystem.Although recovery from failure is more complex in distributed systems than in a centralized system, the ability of most of the systems to continue to operate despite failure of one site, results in increased availability. Availability is crucial for database systems used for realtime applications. Loss of access to data, for example, in an airline may result in the loss of potentialticketbuyerstocompetitors.

    SpeedupQueryProcessing:If a query involves data at several sites, it may be possible to split the query into subqueries that can be executed in parallel by several sites. Such parallel computation allows for faster processing of a users query. In those cases in which data is replicated, queries may be directed by the system to the least heavilyloadedsites.

    Disadvantagesofdistributeddatabasesystemsare:*Complexity(greaterpotentialforbugsinsoftware)*Costly.*Distributionofcontrol(nosingledatabaseadministratorcontrolstheDDB)*Security*Difficulttochange*Lackofexperience

    [email protected]

  • WhatisEER? The Extended EntityRelationship (EER) model is a conceptual (or

    semantic) data model, capable of describing the data requirements for a new informationsysteminadirectandeasytounderstandgraphicalnotation.

    Data requirements for a database are described in terms of a conceptual schema,usingtheEERmodel.

    EERschemataarecomparabletoUMLclassdiagrams.The EER model introduces the additional concepts of subclasses, superclasses, specialization generalization, and attributes inheritance. The resulting model is called the enhancedER or Extended ER model. It is used to model applications more completely and accurately if needed. It includes some objectoriented concepts,suchasinheritance.

    WhyExtendtheERModel?

    ERissuitablefortraditionalbusinessapplications

    ERisnotsemanticallyrichenoughforadvancedapplications

    ApplicationswhereERisinadequate

    Geographicalinformationsystems Searchengines Datamining Multimedia CAD/CAM Softwaredevelopment Engineeringdesign...andothers

    ERtoRelationalMappingAlgorithm(steps)*WeusetheCOMPANYdatabaseexampletoillustratethemappingprocedure.*TheCOMPANYERschemaisshowninFigure9.1,andthecorrespondingCOMPANYrelationaldatabaseschemaisshowninFigure9.2toillustratethemappingsteps.*Weassumethatthemappingwillcreatetableswithsimplesinglevaluedattributes.Therelationalmodelconstraints,whichincludeprimarykeys,unique

    [email protected]

  • keys(ifany),andreferentialintegrityconstraintsontherelations,willalsobespecifiedinthemappingresults.Step1:MappingofRegularEntityTypes.For each regular (strong) entity type E in the ER schema, create a relation R that includes all the simple attributes of E. Include only the simple component attributes of a composite attribute. Choose one of the key attributes of E as the primary key for R. If the chosen key of E is a composite, then the set of simple attributesthatformitwilltogetherformtheprimarykeyofR.If multiple keys were identified forEduring the conceptual design, the information describing the attributes that form each additional key is kept in order to specify secondary (unique) keys of relation R. Knowledge about keys is also kept for indexing purposes and other types of analyses. In our example, we create the relationsEMPLOYEE,DEPARTMENT,andPROJECTinFigure9.2 to correspond to the regular entity types EMPLOYEE, DEPARTMENT, and PROJECT in Figure 9.1. The foreign key and relationship attributes, if any, are not includedyettheywillbeaddedduringsubsequentsteps.

    [email protected]

  • These include the attributes Super_ssn and Dno of EMPLOYEE, Mgr_ssn and Mgr_start_date of DEPARTMENT, and Dnum of PROJECT. In our example, we choose Ssn, Dnumber, and Pnumber as primary keys for the relations EMPLOYEE, DEPARTMENT, and PROJECT, respectively. Knowledge that Dname of DEPARTMENT and Pname of PROJECT are secondary keys is kept for possible use later in the design. The relations that are created from the mapping of entity types are sometimes called entity relationsbecause each tuple represents an entityinstance.TheresultafterthismappingstepisshowninFigure9.3(a).Step2:MappingofWeakEntityTypes.For each weak entity type W in the ER schema with owner entity type E, create a relation R and include all simple attributes (or simple components of composite attributes) of W as attributes of R. In addition, include as foreign key attributes of

    [email protected]

  • R, the primary key attribute(s) of the relation(s) that correspond to the owner entity type(s) this takes care of mapping the identifying relationship type of W. The primary key of R is the combination of the primary key(s) of the owner(s) and the partialkeyoftheweakentitytypeW,ifany.If there is a weak entity type E2 whose owner is also a weak entity type E1, then E1 should be mapped before E2 to determine its primary key first. In our example, we create the relation DEPENDENT in this step to correspond to the weak entity type DEPENDENT (see Figure 9.3(b)).We include the primary key Ssn of the EMPLOYEE relationwhich corresponds to the owner entity typeas a foreign key attribute of DEPENDENT we rename it Essn, although this is not necessary. The primary key of the DEPENDENT relation is the combination {Essn, Dependent_name}, because Dependent_name (also renamed from Name in Figure 9.1) is the partial key of DEPENDENT. It is common to choose the propagate (CASCADE) option for the referential triggered action (see Section 4.2) on the foreign key in the relation corresponding to the weak entity type, since a weak entity has an existence dependency on its owner entity. This can be used for both ONUPDATEandONDELETE.

    [email protected]

  • Step3:MappingofBinary1:1RelationshipTypes.For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that correspond to the entity types participating in R. There are three possibleapproaches:(1)theforeignkey approach, (2) the merged relationship approach, and (3) the crossreference or relationship relation approach. The first approach is the most useful and should be followedunlessspecialconditionsexist,aswediscussbelow.1. Foreign key approach: Choose one of the relationsS, sayand include as a foreign key in S the primary key of T. It is better to choose an entity type with total participation in R in the role of S. Include all the simple attributes (or simple components of composite attributes) of the 1:1 relationship type R as attributes of S. In our example, we map the 1:1 relationship type MANAGES from Figure 9.1 by choosing the participating entity type DEPARTMENT to serve in the role of S because its participation in the MANAGES relationship type is total (every department has a manager). We include the primary key of the EMPLOYEE relation as foreign key in the DEPARTMENT relation and rename it Mgr_ssn.We also include the simple attribute Start_date of the MANAGES relationship type in the DEPARTMENT relation and rename it Mgr_start_date (see Figure 9.2). Note

    [email protected]

  • that it is possible to include the primary key of S as a foreign key in T instead.

    In our example, this amounts to having a foreign key attribute, say Department_managed in the EMPLOYEE relation, but it will have a NULL value for employee tuples who do not manage a department. If only 2 percent of employees manage a department, then 98 percent of the foreign keys would be NULL in this case. Another possibility is to have foreign keys in both relations S and T redundantly, but this creates redundancy and incurs a penalty for consistency maintenance.2. Merged relation approach: An alternative mapping of a 1:1 relationship type is to merge the two entity types and the relationship into a single relation. This is possible when both participations are total, as this would indicate that the two tableswillhavetheexactsamenumberoftuplesatalltimes.3. Crossreference or relationship relation approach: The third option is to set up a third relation R for the purpose of crossreferencing the primary keys of the two relations S and T representing the entity types.As we will see, this approach is requiredforbinaryM:Nrelationships.TherelationRis

    [email protected]

  • called a relationship relation (or sometimes a lookup table), because each tuple in R represents a relationship instance that relates one tuple from S with one tuple from T. The relation R will include the primary key attributes of S and T as foreign keys to S and T. The primary key of R will be one of the two foreign keys, and the other foreign key will be a unique key of R. The drawback is having an extra relation, and requiring an extra join operation when combining related tuples from thetables.Step4:MappingofBinary1:NRelationshipTypes.For each regular binary 1:N relationship type R, identify the relation S that represents the participating entity type at the Nside of the relationship type. Include as foreign key in S the primary key of the relation T that represents the other entity type participating in R we do this because each entity instance on the Nside is related to at most one entity instance on the 1side of the relationship type. Include any simple attributes (or simple components of composite attributes) of the 1:N relationship type as attributes of S. In our example, we now map the 1:N relationship types WORKS_FOR, CONTROLS, and SUPERVISION from Figure 9.1. For WORKS_FOR we include the primary key Dnumber of the DEPARTMENT relation as foreign key in the EMPLOYEE relation and call it Dno. For SUPERVISION we include the primary key of the EMPLOYEE relation as foreign key in the EMPLOYEE relation itselfbecause the relationship is recursive and call it Super_ssn. The CONTROLS relationship is mapped to the foreign key attribute Dnum of PROJECT, which references the primary key Dnumber of the DEPARTMENT relation. These foreign keys are shown in Figure 9.2. An alternative approach is to use the relationship relation (crossreference) option as in the third option for binary 1:1 relationships. We create a separate relation R whose attributes are the primary keys of S and T, which will also be foreign keys to S and T. The primary key of R is the same as the primary key of S. This option can be used if few tuples in S participate in the relationship to avoid excessiveNULLvaluesintheforeignkey.Step5:MappingofBinaryM:NRelationshipTypes.For each binary M:N relationship type R, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types their combination will form the primary key of S.Also include any simple attributes of the M:N relationship type (or simple components of composite attributes) as attributes of S. Notice that we cannot

    [email protected]

  • represent an M:N relationship type by a single foreign key attribute in one of the participating relations (as we did for 1:1 or 1:N relationship types) because of the M:N cardinality ratio we must create a separate relationship relation S. In our example, we map the M:N relationship type WORKS_ON from Figure 9.1 by creating the relation WORKS_ON in Figure 9.2.We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keys in WORKS_ON and rename them Pno and Essn, respectively.We also include an attribute Hours in WORKS_ON to represent the Hours attribute of the relationship type. The primary key of the WORKS_ON relation is the combination of the foreign key attributes {Essn, Pno}. This relationship relation is shown in Figure 9.3(c). The propagate (CASCADE) option for the referential triggered action (see Section 4.2) should be specifiedontheforeignkeysintherelationcorrespondingtotherelationship R, since each relationship instance has an existence dependency on each of the entities it relates. This can be used for both ON UPDATE and ON DELETE. Notice that we can always map 1:1 or 1:N relationships in a manner similar to M:N relationships by using the crossreference (relationship relation) approach, as we discussed earlier. This alternative is particularly useful when few relationship instances exist, in order to avoid NULL values in foreign keys. In this case, the primary key of the relationship relation will be only one of the foreign keys that reference the participating entity relations. For a 1:N relationship, the primary key of the elationship relation will be the foreign key that references the entity relation on the Nside. For a 1:1 relationship, either foreign key can be used astheprimarykeyoftherelationshiprelation.Step6:MappingofMultivaluedAttributes.For each multivalued attribute A, create a new relation R. This relation R will include an attribute corresponding to A, plus the primary key attribute Kas a foreign key in Rof the relation that represents the entity type or relationship type that has A as a multivalued attribute. The primary key of R is the combination of A and K. If the multivalued attribute is composite, we include its simple components. In our example, we create a relation DEPT_LOCATIONS (see Figure 9.3(d)). The attribute Dlocation represents the multivalued attribute LOCATIONS of DEPARTMENT, while Dnumberas foreign keyrepresents the primary key of the DEPARTMENT relation. The primary key of DEPT_LOCATIONS is the combination of {Dnumber, Dlocation}. A separate tuple will exist in DEPT_LOCATIONS for each location that a department has. The propagate (CASCADE) option for the referential triggered action should be specified on the foreign key in the relation R corresponding to the multivalued attribute for both

    [email protected]

  • ON UPDATE and ON DELETE. We should also note that the key of R when mapping a composite, multivalued attribute requires some analysis of the meaning of the component attributes. In some cases, when a multivalued attribute is composite, only some of the component attributes are required to be part of the key of R these attributes are similar to a partial key of a weak entity type that corresponds to the multivalued attribute. Figure 9.2 shows the COMPANY relational database schema obtained with steps 1 through 6, and Figure 3.6 shows a sample database state. Notice that we did not yet discuss the mapping of nary relationship types (n > 2) because none exist in Figure 9.1 these are mapped in a similar way to M:N relationship types by including the following additional step in themappingalgorithm.Step7:MappingofNaryRelationshipTypes.For each nary relationship type R, where n > 2, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types. Also include any simple attributes of the nary relationship type (or simple components of composite attributes) as attributes of S. The primary key of S is usually a combination of all the foreign keys that reference the relations representing the participating entity types. However, if the cardinality constraints on any of the entity types E participating in R is 1, then the primary key of S should not include the foreign key attribute that references the relation E_ corresponding to E (see the discussion in Section 7.9.2 concerningconstraintsonnaryrelationships).For example, consider the relationship type SUPPLY in Figure 7.17. This can be mapped to the relation SUPPLY shown in Figure 9.4, whose primary key is the combinationofthethreeforeignkeys{Sname,Part_no,Proj_name}.0691.)ExplainthefollowingtermsDataMiningECAmodelSpatialdatabaseSpatial databases provide concepts for databases that keep track of objects in a multidimensionalspace.

    [email protected]

  • For example, cartographic databases that store maps include twodimensional spatial descriptions of their objectsfrom countries and states to rivers, cities, roads, seas, and so on. These databases are used in many applications, suchasenvironmental,emergency,andbattlemanagement. Other databases, such as meteorological databases for weather information, are threedimensional, since temperatures and other meteorological information are relatedtothreedimensionalspatialpoints. In general, a spatial database stores objects that have spatial characteristics that describe them. The spatial relationships among the objects are important, and they are often needed when querying the database. Although a spatial database can in general refer to an ndimensional space for any n, we will limit our discussion to twodimensionsasanillustration. The main extensions that are needed for spatial databases are models that can interpret spatial characteristics. In addition, special indexing and storage structures areoftenneededtoimproveperformance.SpecializationandgeneralizationinanERRmodelThe ER Model has the power of expressing database entities in a conceptual

    hierarchical manner. As the hierarchy goes up, it generalizes the view of entities,

    andaswegodeepinthehierarchy,itgivesusthedetailofeveryentityincluded.

    Generalization

    The process of generalizing entities, where the

    generalized entities contain the properties of

    all the generalized entities, is called

    generalization. In generalization, a number of

    [email protected]

  • entities are brought together into one generalized entity based on their similar

    characteristics. For example, pigeon, house sparrow, crow and dove can all be

    generalizedasBirds.

    Specialization

    Specialization is the opposite of generalization. In

    specialization, a group of entities is divided into

    subgroups based on their characteristics. Take a

    group Person for example. A person has name, date

    of birth, gender, etc. These properties are common in

    all persons, human beings. But in a company, persons

    can be identified as employee, employer, customer, or vendor, based on what role

    they play in the company.Similarly, in a school database, persons can be

    specialized as teacher, student, or a staff, based on what role they play in school as

    entities.

    XMLandHTMLKey Difference: HTML is a markup language that is used to design web pages. It is written in predefined tag elements. Its primary purpose is to display data with focus on how the data looks. XML is a markup language whose primary purpose is to transport and store data. It is a language that can be used to develop new languages and define other languages. It does not have a predefined set of tags, and allows the developertocustomizetags.

    [email protected]

  • HyperText Markup Language (HTML) is a well known mark up language used to develop web pages. It has been around for a long time and is commonly used in webpage design. XML or Extensible Markup Language defines a set of rules for encodingdocumentsinaformatthatcanbereadbyboth,humanandcomputer.HTML is written using HTML elements, which consist of tags, primarily an opening tag and a closing tag. The data between these tags is usually the content. The main objective of HTML is to allow web browsers to interpret and display the content written between the tags. The tags are designed to describe the page content. HTML comes with predefined tags. These days, web pages are rarely only designedusingHTML.

    HTML XML

    Definition Markuplanguagefordisplayingwebpagesinawebbrowser.Designedtodisplaydatawithfocusonhowthedatalooks

    Markuplanguagedefinesasetofrulesforencodingdocumentsthatcanbereadbybothhumansandmachines.Designedwithfocusonstoringandtransportingdata.

    Datewheninvented 1990 1996

    Extendedfrom SGML SGML

    Type Static Dynamic

    Usage Displayawebpage Transportdatabetweentheapplicationandthedatabase.Todevelopothermarkuplanguages.

    Processing/Rules Nostrictrules.Browserwillstillgeneratedatatothebestofitsability

    Strictrulesmustbefollowedorprocessorwillterminateprocessingthefile

    [email protected]

  • Languagetype Presentation Neitherpresentation,norprogramming

    Tags Predefined Customtagscanbecreatedbytheauthor

    WhiteSpace Cannotpreservewhitespace

    Preserveswhitespace

    Limitations Datadoesnotknowitselfverywell.Datacannotchangeinresponsetoenvironment.Datacannotbeeasilymaintained.Cannotstoreorcallonvariables.Lacksthecapabilitytodefinenewstructuresbydefiningrelationshipsbetweenclasses.Tagsarenotusefulforexchangingthedocumentbetweenapplications.

    Cannotbeusedasasubtypeofasql_variantinstance.Doesnotsupportcastingorconvertingtoeithertextornontext.Doesnotsupportthefollowingcolumnandtableconstraints.XMLprovidesitsownencoding.Collationsapplytostringtypesonly.Cannotbecomparedorsorted.CannotbeusedinDistributedPartitionedViews.Notwellsupportedbybrowsers.

    GISDataMining:Data Mining is the analytic Process. It is designed to explore the data usually big data in search of consistent patterns and/or systematic relationships between

    [email protected]

  • variables and then to validate the findings by applying the detected patterns to new subsets of data. Data mining refers to the mining or discovery of new information in terms of patterns or rules from vast amounts of data. To be practically useful, data mining must be carried out efficiently on large files and databases. To date, it isnotwellintegratedwithdatabasemanagementsystems.ECAModelThe model that has been used for specifying active database rules is referred to as the EventConditionAction, or ECA model. A rule in the ECA model has three components:

    1. 1. The event (or events) that trigger the rule: These events are usually database update operations that are explicitly applied to the database. However, in the general model, they could also be temporal events or other kindsofexternalevents.

    2. The condition that determines whether the rule action should be executed: Once the triggering event has occurred, an optional condition may be evaluated. If no condition is specified, the action will be executed once the event occurs. If a condition is specified, it is first evaluated, and only if it evaluatestotruewilltheruleactionbeexecuted.

    3. The action to be taken: The action is usually a sequence of SQL statements, but it could also be a database transaction or an external program that will be automaticallyexecuted.

    SpatialDatabaseSpatial databases provide concepts for databases that keep track of objects in a multidimensionalspace.

    For example, cartographic databases that store maps include twodimensional spatial descriptions of their objectsfrom countries and states to rivers, cities, roads, seas, and so on. These databases are used in many applications, such as environmental, emergency, and battle management.

    Other databases, such as meteorological databases for weather information, are threedimensional, since temperatures and other meteorological informationarerelatedtothreedimensionalspatialpoints.

    In general, a spatial database stores objects that have spatial characteristics that describe them. The spatial relationships among the objects are important, and they are often needed when querying the database. Although

    [email protected]

  • a spatial database can in general refer to an ndimensional space for any n, wewilllimitourdiscussiontotwodimensionsasanillustration.

    The main extensions that are needed for spatial databases are models that can interpret spatial characteristics. In addition, special indexing and storage structuresareoftenneededtoimproveperformance.

    Q071,6)whataretheadvantageanddisadvantageofOODBMS?Advantages1.Improvedsoftwaredevelopmentproductivity:Objectoriented programming is modular, as it provides separation of duties in objectbased program development. It is also extensible, as objects can be extended to include new attributes and behaviors. Objects can also be reused within an across applications. Because of these three factors modularity, extensibility, and reusability objectoriented programming provides improved softwaredevelopment productivity over traditional procedurebased programming techniques.2.Improvedsoftwaremaintainability:For the reasons mentioned above, object oriented software is also easier to maintain. Since the design is modular, part of the system can be updated in case of issueswithoutaneedtomakelargescalechanges.3.Fasterdevelopment:Reuse enables faster development. Objectoriented programming languages come with rich libraries of objects, and code developed during projects is also reusable in futureprojects.4.Lowercostofdevelopment:The reuse of software also lowers the cost of development. Typically, more effort is put into the objectoriented analysis and design, which lowers the overall cost of development.5.Higherqualitysoftware:Faster development of software and lower cost of development allows more time and resources to be used in the verification of the software. Although quality is

    [email protected]

  • dependent upon the experience of the teams, object oriented programming tends to resultinhigherqualitysoftware.6.MoreexpressivequerylanguageNavigational access from the object is the most common form of data access in an OODBMS. This is in contrast to the associative access of SQL (that is, declarative statements with selection based on one or more predicates). Navigational access is moresuitableforhandlingpartsexplosion,recursivequeries,andsoon.7.CapableofhandlingalargevarietyofdatatypesUnlike traditional databases (such as hierarchical, network or relational), the object oriented database are capable of storing different types of data, for example, pictures,voicevideo,includingtext,numbersandsoon.

    DisadvantagesofOODBMSs

    TherearefollowingdisadvantagesofOODBMSs:

    1. Lack of universal data model: There is no universally agreed data model for an OODBMS, and most models lack a theoretical foundation. This .disadvantage is seen as a significant drawback, and is comparable to prerelationalsystems.

    2. Lack of experience: In comparison to RDBMSs the use of OODBMS is still relatively limited. This means that we do not yet have the level of experience that we have with traditional systems. OODBMSs are still very much geared towards the programmer, rather than the nave enduser. Also there is a resistance to the acceptance of the technology. While the OODBMS is limited to a small niche market, this problem will continue to exist

    [email protected]

  • 3. Lack of standards: There is a general lack of standards of OODBMSs. We have already mentioned that there is not universally agreed data model. Similarly,thereisnostandardobjectorientedquerylanguage.

    4. Competition: Perhaps one of the most significant issues that face OODBMS vendors is the competition posed by the RDBMS and the emerging ORDBMS products. These products have an established user base with significant experience available. SQL is an approved standard and the relational data model has a solid theoretical formation and relational productshavemanysupportingtoolstohelp.bothendusersanddevelopers.

    5. Query optimization compromises encapsulations: Query optimization requires. An understanding of the underlying implementation to access the database efficiently. However, this compromises the concept of encapsulation.

    6. Complexity: The increased functionality provided by the OODBMS (such as the illusion of a singlelevel storage model, pointer sizzling, longduratipntransactions, version management, and schema evolutionmakes the system more complex than that of traditional DBMSs. In complexity leads to products that are more expensive and more difficult touse.

    7. Larger program size: Objectoriented programs typically involve more linesofcodethanproceduralprograms.

    8. Lack of support for views: Currently, most OODBMSs do not provide a view mechanism, which, as we have seen previously, provides many advantages such as data independence, security, reduced complexity, and customization.

    9. Lack of support for security: Currently, OODBMSs do not provide adequate security mechanisms. The user cannot grant access rights on individualobjectsorclasses.

    10.Not suitable for all types of problems: There are problems that lend themselves well to functionalprogramming style, logicprogramming style, or procedurebased programming style, and applying objectoriented programminginthosesituationswillnotresultinefficientprograms.

    [email protected]

  • Q 069,8070,4) Differentiate between object oriented database and relational database.RelationalvsObjectOrientedRDs:

    Mature Extensivelytested Vastamountsofdatainthisformatalready Programmersknowhowtooptimizeforhighspeedretrieval

    OODs:

    New hereisageneralshortageofexperienced,qualityprogrammers Lackconsensusonstandards,definitions,etc. Performanceconcerns

    Q071,3)Distinguishbetweenpersistentandtransientobjects.

    PersistentObject TransientObject

    Apersistent object is an instance of a class in the domain model that represents some information extractedfromthedatabase.

    A transient object is an instance of a class in the domain model, which is createdinmemory

    Persistent objects have permanent memory they remain in memory untiltheyareexplicitly

    Transient objects are the objects which lie in application memory,

    [email protected]

  • removed. once application is ended this object alsogetvanished.

    Persistent objects are objects that are boundtoDBObject.

    Transient objects cannot be converted to persistent objects as they have their lifetime defined at the timeoftheirinstantiation.

    Persistence means, using serializable interface, storing objects permanentlyintheharddisk

    Transients are non serializable objects.

    Data that is not changed and where static binding is applied is called persistentbaseclassobject.

    Data is stored in the class so the object of that class having dynamic binding is called transient class object.

    Persistentisrelatedtostatic. Transientisrelatedtodynamic.

    Persistentobjectsareonheap. Transient objects are in the transient memory.

    Note: Persistent objects when call transient objects, first it is loaded into memory. Afterprocessatransientobjectscanbestoredaspersistentobjectsinharddisk.Q 2070 5) discuss some application of active database. How do spatial databasedifferfromregulardatabase?Anactivedatabasesystem(ADBS)isaDBSthatmonitorssituationsofinterestand,whentheyoccur,triggersanappropriateresponseinatimelymanner.

    [email protected]

  • The desired behavior is expressed in production rules (alsocalledeventconditionaction rules), which are defined and stored in theDBS.This has the benefits that the rules can be shared by many applicationprograms, and the DBS can optimizetheirimplementation.ApplicationsProductioncontrol,e.g.,powerplants,...Maintenancetasks,e.g.,inventorycontrol,...Financialapplications,e.g.,stock&bondtrading,...NetworkmanagementAirtrafficcontrolProgramtradingComputerIntegratedManufacturing(CIM)Q 071 ,4) Discuss how time is represented in temporal database and compare thedifferenttimedimensions.For temporal databases, time is considered to be an ordered sequence of pointsin some granularity that is determined by the application. For example, suppose that some temporal application never requires time units that are less than one second. Then, each time point represents one second in time using this granularity. In reality, each second is a (short) time duration, not a point, since it may be further dividedintomilliseconds,microseconds,andsoon.

    [email protected]

  • Because there is no known beginning or ending of time, one needs a reference point from which to measure specific time points. Various calendars are used by variouscultures.In SQL2, the temporal data types include DATE (specifying Year, Month, and Day as YYYYMMDD), TIME (specifying Hour, Minute, and Second as HH:MM:SS), TIMESTAMP (specifying a Date/Time combination, with options for including subsecond divisions if they are needed), INTERVAL (a relative time duration, such as 10 days or 250 minutes), and PERIOD (an anchored time duration with a fixed starting point, such as the 10day period from January 1, 1999 toJanuary10,1999,inclusive).

    TimeDimensions

    The most natural interpretation is that the associated time is the time that the event occurred, or the period during which the fact was considered to be true in the real world. If this interpretation is used, the associated time is often referred to as the valid time. A temporal database using this interpretation is called a valid time database.

    However, a different interpretation can be used, where the associated time refers to the time when the information was actually stored in the database that is, it is the value of the system time clock when the information is valid in the system. In this case, the associated time is called thetransaction time.A temporal database using thisinterpretationiscalledatransactiontimedatabase.

    Other interpretations can also be intended, but these two are considered to be the most common ones, and they are referred to as time dimensions. In some applications, only one of the dimensions is needed and in other cases both time dimensions are required, in which case the temporal database is called a bitemporal database. If other interpretations are intended for time, the user can define the semantics and program the applications appropriately, and it is called a userdefinedtime.

    [email protected]

  • 070 10) Enumerate the limitation of conventional database compared to multimediadatabase

    The incorporation of multimedia database systems will improve the quantity and quality of information manipulated by computer users in all fields, computer aided design, and information retrieval. The area of intelligent multimedia content analysis and retrieval techniques is an emerging discipline. Techniques for representing and extracting semantic information from media such as speech, images,andvideoarerequired.

    When a multimedia application lacks a database, the data structure is buried in the script, where all of its value is lost. This omission also makes the script more complicated and less flexible. Using a multimedia database makes the data structure logic available to other multimedia applications and simplifies the script so that many scripts can share the same multimedia metadata. In addition, when a multimedia or abstract data database is organized and annotated for one application, other applications can use those annotations without going through the same timeconsuming process. This capability adds great value to the data through reuseandcontrolledredundancy.

    When multimedia application content is controlled by the multimedia database, multimedia content can be added, deleted, or modified without modifying the application script. For example, interactive kiosks that display, describe, and demonstrate products can be updated automatically without reprogramming the application script. Furthermore, a multimedia application such as a multimedia textbook can actually control the operation of book topics that have the same look and feel. This control lets the script perform as a template: An entire series of math textbooks (algebra, calculus, trigonometry, and geometry), including text and video, can use the same multimedia application because all data is physically separate.

    Search and retrieval operations are critical in interactive multimedia applications they must be equally efficient and powerful. Search and retrieval of multimedia and abstract data is challenging, but multimedia databases make it feasible through internal storage format flexibility and efficient operation. The DBMS should have

    [email protected]

  • significant knowledge about the data and its structure to enable powerful semantic optimizations and intelligent searches. Search and retrieval operations also give the application access to media components so that they can be dynamically and seamlesslyprocessedwhennecessary.

    Q model 069,9070,8) What is data warehouse? list the characteristic of data warehouse?howitisdifferfromdatabases.

    A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the pointofview of the end user who may need access to specialized, sometimes local databases. The latter idea is known as the data mart. Typically, a data warehouse is housed on an enterprise mainframe server or increasingly, in the cloud. Data from various online transaction processing (OLTP) applications and other sources is selectively extracted for use by analytical applicationsanduserqueries.

    characteristicofdatawarehouse

    Subject Oriented A data warehouse is subject oriented because it

    provides information around a subject rather than the organization's

    ongoing operations. These subjects can be product, customers, suppliers,

    sales, revenue, etc. A data warehouse does not focus on the ongoing

    operations, rather it focuses on modelling and analysis of data for decision

    making.

    [email protected]

  • Integrated A data warehouse is constructed by integrating data from

    heterogeneous sources such as relational databases, flat files, etc. This

    integrationenhancestheeffectiveanalysisofdata.

    Time Variant The data collected in a data warehouse is identified with a

    particular time period. The data in a data warehouse provides information

    fromthehistoricalpointofview.

    Nonvolatile Nonvolatile means the previous data is not erased when

    new data is added to it. A data warehouse is kept separate from the

    operational database and therefore frequent changes in operational database

    isnotreflectedinthedatawarehouse.

    DatawarehouseVSDatabase

    Sr.N

    o.

    DataWarehouse(OLAP) OperationalDatabase(OLTP)

    1 It involves historical processing

    ofinformation.

    Itinvolvesdaytodayprocessing.

    2 OLAP systems are used by

    knowledge workers such as

    executives, managers, and

    analysts.

    OLTP systems are used by clerks,

    DBAs,ordatabaseprofessionals.

    [email protected]

  • 3 Itisusedtoanalyzethebusiness. Itisusedtorunthebusiness.

    4 ItfocusesonInformationout. ItfocusesonDatain.

    5 It is based on Star Schema,

    Snowflake Schema, and Fact

    ConstellationSchema.

    It is based on Entity Relationship

    Model.

    6 ItfocusesonInformationout. Itisapplicationoriented.

    7 Itcontainshistoricaldata. Itcontainscurrentdata.

    8 It provides summarized and

    consolidateddata.

    It provides primitive and highly

    detaileddata.

    9 It provides summarized and

    multidimensionalviewofdata.

    It provides detailed and flat

    relationalviewofdata.

    10 The number of users is in

    hundreds.

    The number of users is in

    thousands.

    11 The number of records accessed

    isinmillions.

    The number of records accessed

    isintens.

    [email protected]

  • 12 The database size is from 100GB

    to100TB.

    The database size is from 100

    MBto100GB.

    13 Thesearehighlyflexible. Itprovideshighperformance.

    06910)Explainmobilecomputingarchitecturewithsuitablediagram

    The general architecture of a mobile platform is illustrated in Figure 29.1. It is a distributed architecture where a number of computers, generally referred to as Fixed Hosts and Base Stations, are interconnected through a highspeed wired network. Fixed hosts are general purpose computers that are not typically equipped to manage mobile units but can be configured to do so. Base stations function as gateways to the fixed network for the Mobile Units. They are equipped with wireless interfaces and offer network access services of which mobile units are clients.

    Wireless Communications. The wireless medium on which mobile units and base stations communicate have bandwidths significantly lower than those of a wired network. The current generation of wireless technology has data rates that range from the tens to hundreds of kilobits per second (2G cellular telephony) to tens of megabits per second (wireless Ethernet, popularly known as WiFi). Modem (wired) Ethernet, by comparison, provides data rates on the order of hundreds of megabitspersecond.

    Besides data rates, other characteristics also distinguish wireless connectivity options. Some of these characteristics include range, interference, locality of access, and support for packet switching. Some wireless access options allow seamless roaming throughout a geographical region (e.g., cellular networks), whereas WiFi networks are localized around a base station. Some wireless networks, such as WiFi and Bluetooth, use unlicensed areas of the frequency

    [email protected]

  • spectrum, which may cause interference with other appliances, such as cordless telephones. Finally, modem wireless networks can transfer data in units called packets, that are commonly used in wired networks in order to conserve bandwidth. Wireless applications must consider these characteristics when choosing a communication option. For example, physical objects block infrared frequencies. While inconvenient for some applications, such blockage allows for secure wireless communications within a closedroom.

    Client/NetworkRelationships. Mobile units can move freely in a geographic mobility domain, an area that is circumscribed by wireless network coverage. To manage the mobility of units, the entire geographic mobility domain is divided into one or more smaller domains, called cells, each of which is supported by at least one basestation.The

    mobile discipline requires that the movement of mobile units be unrestricted throughout the cells of a geographic mobility domain, while maintaining information access contiguity Le., movement, especially intercell movement, does not negatively affect the data retrieval process. The communication architecture just described is designed to give the mobile unit the impression that it is attached to a fixed network, emulating a traditional clientserver architecture.

    [email protected]

  • Wireless communications, however, make other architectures possible. One alternative is a mobile adhoc network (MANET), illustrated in Figure 29.2. In a MANET, colocated mobile units do not need to communicate via a fixed network, but instead, form their own using costeffective technologies such as Bluetooth. In a MANET, mobile units are responsible for routing their own data, effectively acting as base stations as well as clients. Moreoever, they must be robust enough to handle changes in the network topology, such as the arrival or departure of other mobile units. MANET applications can be considered as peertopeer, meaning that a mobile unit is simultaneously a client and a server. Transaction processing and data consistency control become more difficult since there is no central control in this architecture. Resource discovery and data routing by mobile units make computing in a MANET even more complicated. Sample MANET applications are multiuser games, shared whiteboards, distributed calendars, and battle information sharing. The expectation is that these networks and related applications will become dominant in a few years. Currently MANETs are an active research area in both academia and industry. This research is still in its infancy, so the following discussion will focus on the basic mobile computing architecture described previously.

    0719)Describethecharacteristicsofmobilecomputingenvironmentindetail.

    As we discussed in the previous section, the characteristics of mobile computing include high communication latency, intermittent wireless connectivity, limited battery life, and, of course, changing client location. Latency is caused by the processes unique to the wireless medium, such as coding data for wireless transfer, andtrackingandfilteringwireless

    signals at the receiver. Battery life is directly related to battery size, and indirectly related to the mobile device's capabilities. Intermittent connectivity can be intentional or unintentional. Unintentional disconnections happen in areas wireless signals cannot reach, e.g., elevator shafts or subway tunnels. Intentional disconnectionsoccurbyuserintent,

    [email protected]

  • e.g., during an airplane takeoff, or when the mobile device is powered down. Finally, clients are expected to move, which alters the network topology and may cause their data requirements to change. All of these characteristics impact data management, and robust mobile applications must consider them in their design. To compensate for high latencies and unreliable connectivity, clients cache replicas of important, frequently accessed data, and work offline, if necessary. Besides increasing data availability and response time, caching can also reduce client powerconsumptionby

    eliminating the need to make energyconsuming wireless data transmissions for each data access. On the other hand, the server may not be able to reach a client. A client may be unreachable because it is dozingin an energyconserving state in which many subsystems are shut downor because it is out of range of a base station. In either case, neither client nor server can reach the other, and modifications must be made to the architecture in order to compensate for this case. Proxies for unreachable components are added to the architecture. For a client (andsymmetricallyforaserver),theproxycan

    cache updates intended for the server. When a connection becomes available, the proxy automatically forwards these cached updates to their ultimate destination. As suggested above, mobile computing poses challenges for servers as well as clients. The latency involved in wireless communication makes scalability a problem. Because latency due to wireless communications increases the time to service each client request, the server can handle fewer clients. One way servers relieve this problem is by broadcasting data whenever possible. Broadcast takes advantage of a natural characteristic of radio communications, and is scalable because a single broadcast of a data item can satisfy all outstanding requests for it. For example, instead of sending weather information to all clients in a cell individually, a server can simply broadcast it periodically. Broadcast also reduces the load on the server, asclientsdonothavetomaintainactiveconnectionstoit.

    Client mobility also poses many data management challenges. First, servers must keep track of client locations in order to efficiently route messages to them. Second, client data should be stored in the network location that minimizes the

    [email protected]

  • traffic necessary to access it. Keeping data in a fixed location increases access latency if the client moves "far away" from it. Finally, as stated above, the act of moving between cells must be transparent to the client. The server must be able to gracefully divert the shipment of data from one base station to another, without the client noticing. Client mobility also allows new applications that are locationbased. For example, consider an electronic valet application that can tell a user the location of the nearest restaurant. Clearly, "nearest" is relative to the client'scurrentposition,andmovement

    can invalidate any previously cached responses. Upon movement, the client must efficientlyinvalidatepartsofitscacheandrequestupdateddatafromthedatabase.

    071 10) Differentiate between XML schema and XML DTD with suitable example.

    The critical difference between DTDs and XML Schema is that XML Schema utilize an XMLbased syntax, whereas DTDs have a unique syntax held over from SGML DTDs. Although DTDs are often criticized because of this need to learn a new syntax, the syntax itself is quite terse. The opposite is true for XML Schema, which are verbose, but also make use of tags and XML so that authors of XML shouldfindthesyntaxofXMLSchemalessintimidating.

    The goal of DTDs was to retain a level of compatibility with SGML for applications that might want to convert SGML DTDs into XML DTDs. However, in keeping with one of the goals of XML, "terseness in XMLmarkup is of minimal importance,"thereisnorealconcernwithkeepingthesyntaxbrief.

    LIST1 is an example using DTD and providing a schema definition for the content above, while LIST2 is an example using XML Schema to provide a schema definition(employee.xs).

    LIST1:EmployeeInformationDTD

    [email protected]

  • LIST2EmployeeInformationXMLSchemaemployee.xs

    [email protected]

  • As we see, the syntax is completely different between the two. For the DTD, a unique syntax is written, whereas the XML Schema is written in XML format conforming to XML 1.0 syntax. LIST3 is an example of a valid XML document fortheLIST2XMLSchema(employee.xml).

    For DTD, a DOCTYPE declaration is used to associate with the XML document but, in the case of XML Schema, the specification does not particularly determine anything with respect to the association of the XML document. Accordingly, the implementation method of the validation tool actually used is followed. However, under the XML Schema specification, there is a defined method for writing a hint to associate with the XML document. The following content is inserted into the rootelementoftheXMLdocument.

    0711) Explainthefollowingterms:

    a) DataWarehouseA data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. Data

    [email protected]

  • warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the pointofview of the end user who may need access to specialized, sometimes local databases. The latter idea is known as the data mart. Typically, a data warehouse is housed on an enterprise mainframe server or increasingly, in the cloud. Data from various online transaction processing (OLTP) applications and other sources is selectively extracted for use by analytical applications and user queries.

    b) DistributionTransparencyDistribution transparency allows the user to perceive the database as a single, logical entity. If a DDBMS exhibits distribution transparency, then the user does not need to know the data is fragrances (fragmentation transparency) or the location of data items (Local transparency). Distribution transparency can be classified into:

    Fragmentation transparency Fragmentation is the highest level of distribution transparency. If fragmentation transparency is provided by the DDBMS, then the user does not need to know that the data is fragmented, As a result, database accesses are based on the global schema,. so the user does not need to specify fragment names or data locations.

    Location transparency Location is the middle level of distribution transparency. With location transparency, the user must know how the data has been fragmented but still does not have to know the location of the data.

    Replication transparency Closely related to location transparency is replication transparency, which means that the user is unaware of the replication of fragments. Replication transparency is implied' by location transparency.

    Local mapping transparency

    [email protected]

  • This is the lowest level of distribution transparency. With local mapping transparency, user needs to specify both fragment names and the location of data items, taking into consideration any replication that may exists. Clearly, this is a more complex and time-consuming query for the user to enter than the first. It is unlikely that a system that provides only this level of transparency would be acceptable to end-users.

    Naming transparency As a corollary to the above distribution transparencies, we have naming transparency. As in a centralized database, each item in a distributed database must. have a unique name. Therefore, the DDBMS must ensure that no two sites create a database object with the same name. One solution to this problem is to create a central name server, which has the responsibility to ensure uniqueness of all names in the system. However, this approach results in:

    Loss of some local autonomy; Performance problems, if the central site becomes a bottleneck; Low availability; .if the central site fails the remaining sites cannot create any .new database objects.

    An alternatively solution is to prefix an object , with the identifier of the site that created it For example, the relation branch created at site S1 might be named S1.Branch. Similarly, we need to be able to identify each fragment and each of its copies. Thus, copy 2 of fragment 3 of the Branch relation created at site S1 might be referred to as S1.Branch.F3.C2. However, this results in loss of distribution transparency. An approach that resolves the problems with both these solution uses aliases (sometimes called synonyms) for each database object. Thus, S 1.Branch.F3.C2 might be known as Local Branch by the user at site S1. The DDBMS has the task of mapping an alias to the appropriate database object.

    c) XQuery

    XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendorspecific extensions for

    [email protected]

  • other data formats (JSON, binary, etc.). The language is developed by theXMLQueryworkinggroupoftheW3C.

    d) DistributiontransactionA distributed transaction includes one or more statements that, individually or as a group, update data on two or more distinct nodes ofadistributeddatabaseA distributed transaction is composed of several subtransactions, eachrunningonadifferentsite.Eachdatabasemanager(DM)candecidetoabort(thevetoproperty).An Atomic Commitment Protocol (ACP) is run by each of the DMs to ensure that all the subtransactions are consistently committed or aborted.

    e) Knowledgebase

    In general, a knowledge base is a centralized repository for information: a public library, a database of related information about a particular subject, and whatis.In general, a knowledge base is a centralized repository for information: a public library, a database of related information about a particular subject, and whatis.com could all be considered to be examples of knowledge bases. In relation to information technology (IT), a knowledge base is a machinereadable resource for the dissemination of information, generally online or with the capacity to be put online. An integral component of knowledge management systems, a knowledge base is used to optimize information collection, organization, and retrieval for an organization, or for thegeneralpublic.

    f) ClassificationandclusteringCLASSIFICATIONWe have a Training set containing data that have been previously categorized

    [email protected]

  • Based on this training set, the algorithms finds the category that the newdatapointsbelongtoSince a Training set exists, we describe this technique as Supervised learningExample:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether acustomerwillchurnornot.ClusteringWedonotknowthecharacteristicsofsimilarityofdatainadvanceUsing statistical concepts, we split the datasets into subdatasets such thattheSubdatasetshaveSimilardataSince Training set is not used, we describe this technique as UnsupervisedlearningExample:We use a dataset of customers and split them into subdatasets of customers with similar characteristics. Now this information can be used to market a product to a specific segment of customersthathasbeenidentifiedbyclusteringalgorithm

    2.)DistinguishmultipleinheritanceandselectiveinheritanceinOOconcepts.Multiple inheritances in a type hierarchy occurs when a certain subtype T is a subtypeoftwo(ormore) types and hence inherits the functions (attributes and methods) of both super types.For example, we may create a subtype ENGINEERING_MANAGER that is a subtypeofbothMANAGERand ENGINEER. This leads to the creation of a type lattice rather than atypehierarchy.0701.)Explainthefollowingterms:

    [email protected]

  • Extent

    Temporaldatabase

    A temporal database is a database that has certain features that support timesensitive status for entries. Where some databases are considered current databases and only support factual data considered valid at the time of use, a temporal database can establish at what times certain entries are accurate. For temporal databases, time is considered to be an ordered sequence of points in some granularity that is determined by the application. For example, suppose that some temporal application never requires time units that are less than one second. Then, each time point represents one second in time using this granularity. In reality, each second is a (short) time duration, not a point, since it may be further dividedintomilliseconds,microseconds,andsoon.Because there is no known beginning or ending of time, one needs a reference point from which to measure specific time points. Various calendars are used by variouscultures.

    DegreeofhomogeneityofDBMSXPath XPath is a syntax for defining parts of

    anXMLdocument

    [email protected]

  • XPathusespathexpressionstonavigateinXMLdocuments XPathcontainsalibraryofstandardfunctions XPathisamajorelementinXSLT XPathisaW3Crecommendation

    OLAPOLAP (online analytical processing) enables a user to easily and selectively

    extract and view data from different pointsofview.OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as twodimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into subattributes.

    2.) Draw an ER Diagram for a hospital with a set of patients and set of doctors associated with each patient a log of various tests and examinations conducted.

    [email protected]

  • (Addappropriateattributesfortheentitiesyourselforseebelow)

    6.) Write a schema that provides tags for a persons first name, last name, weight, and shoe size. Weight and shoe size tags should have attributes to designatemeasuringsystems.

    [email protected]

  • 9.) What are the advantages and disadvantages of extending the relational datamodelbymeansofORDBMS?5.) What is the difference between structured and unstructured complex object?Differentiateidenticalversusequalobjectswithexamples.7.) What are the differences and similarities between objects and literals in the ODMGobjectmodel?

    [email protected]