dw questions

Upload: souvik-banerjee

Post on 07-Jan-2016

234 views

Category:

Documents


0 download

DESCRIPTION

Collection of Data warehouse questions and information

TRANSCRIPT

1.Data scrubbing is which of the following?

A.A process to reject data from the data warehouse and to create the necessary indexes

B.A process to load the data in the data warehouse and to create the necessary indexes

C.A process to upgrade the quality of data after it is moved into a data warehouse

D.A process to upgrade the quality of data before it is moved into a data warehouse

Answer: Option D

2.The active data warehouse architecture includes which of the following?

A.At least one data mart

B.Data that can extracted from numerous internal and external sources

C.Near real-time updates

D.All of the above.

Answer: Option D

3.A goal of data mining includes which of the following?

A.To explain some observed event or condition

B.To confirm that data exists

C.To analyze data for expected relationships

D.To create a new data warehouse

Answer: Option A

4.An operational system is which of the following?

A.A system that is used to run the business in real time and is based on historical data.

B.A system that is used to run the business in real time and is based on current data.

C.A system that is used to support decision making and is based on current data.

D.A system that is used to support decision making and is based on historical data.

Answer: Option B

5.A data warehouse is which of the following?

A.Can be updated by end users.

B.Contains numerous naming conventions and formats.

C.Organized around important subject areas.

D.Contains only current data.

Answer: Option C

6.A snowflake schema is which of the following types of tables?

A.Fact

B.Dimension

C.Helper

D.All of the above

Answer: Option D

7.The generic two-level data warehouse architecture includes which of the following?

A.At least one data mart

B.Data that can extracted from numerous internal and external sources

C.Near real-time updates

D.All of the above.

Answer: Option B

8.Fact tables are which of the following?

A.Completely denormalized

B.Partially denormalized

C.Completely normalized

D.Partially normalized

Answer: Option C

9.Data transformation includes which of the following?

A.A process to change data from a detailed level to a summary level

B.A process to change data from a summary level to a detailed level

C.Joining data from one source into various sources of data

D.Separating data from one source into various sources of data

Answer: Option A

10.Reconciled data is which of the following?

A.Data stored in the various operational systems throughout the organization.

B.Current data intended to be the single source for all decision support systems.

C.Data stored in one operational system in the organization.

D.Data that has been selected and formatted for end-user support applications.

Answer: Option B

11.The load and index is which of the following?

A.A process to reject data from the data warehouse and to create the necessary indexes

B.A process to load the data in the data warehouse and to create the necessary indexes

C.A process to upgrade the quality of data after it is moved into a data warehouse

D.A process to upgrade the quality of data before it is moved into a data warehouse

Answer: Option B

12.The extract process is which of the following?

A.Capturing all of the data contained in various operational systems

B.Capturing a subset of the data contained in various operational systems

C.Capturing all of the data contained in various decision support systems

D.Capturing a subset of the data contained in various decision support systems

Answer: Option B

13.A star schema has what type of relationship between a dimension and fact table?

A.Many-to-many

B.One-to-one

C.One-to-many

D.All of the above.

Answer: Option C

14.Transient data is which of the following?

A.Data in which changes to existing records cause the previous version of the records to be eliminated

B.Data in which changes to existing records do not cause the previous version of the records to be eliminated

C.Data that are never altered or deleted once they have been added

D.Data that are never deleted once they have been added

Answer: Option A

15.A multifield transformation does which of the following?

A.Converts data from one field into multiple fields

B.Converts data from multiple fields into one field

C.Converts data from multiple fields into multiple fields

D.All of the above

Answer: Option D

1.A data mart is designed to optimize the performance for well-defined and predicable uses.

A.TrueB.False

Answer: Option A

2.Successful data warehousing requires that a formal program in total quality management (TQM) be implemented.

A.TrueB.False

Answer: Option A

3.Data in operational systems are typically fragmented and inconsistent.

A.TrueB.False

Answer: Option A

4.Most operational systems are based on the use of transient data.

A.TrueB.False

Answer: Option A

5.Independent data marts are often created because an organization focuses on a series of short-term business objectives.

A.TrueB.False

Answer: Option A

6.Joining is the process of partitioning data according to predefined criteria.

A.TrueB.False

Answer: Option B

7.The role of the ETL process is to identify erroneous data and to fix them.

A.TrueB.False

Answer: Option B

8.Data in the data warehouse are loaded and refreshed from operational systems.

A.TrueB.False

Answer: Option A

9.Star schema is suited to online transaction processing, and therefore is generally used in operational systems, operational data stores, or an EDW.

A.TrueB.False

Answer: Option B

10.Periodic data are data that are physically altered once added to the store.

A.TrueB.False

Answer: Option B

11.Both status data and event data can be stored in a database.

A.TrueB.False

Answer: Option A

12.Static extract is used for ongoing warehouse maintenance.

A.TrueB.False

Answer: Option B

13.Data scrubbing can help upgrade data quality; it is not a long-term solution to the data quality problem.

A.TrueB.False

Answer: Option A

14.Every key used to join the fact table with a dimensional table should be a surrogate key.

A.TrueB.False

Answer: Option A

15.Derived data are detailed, current data intended to be the single, authoritative source for all decision support applications.

A.TrueB.False

Answer: Option B

What is a data warehouse, and why is it used?A data warehouse is a repository of data. The pieces of information stored are relevant to each other and support the decision making tree of some corporation or entity. It can incorporate multiple data sources to store all the data connected to the subject. Typically it is composed by archive or historical data that can be analyzed. A data warehouse is supported on a database system.What are the basic stages of a data warehouse?The first stage to build a data warehouse is the initial data introduction, typically this can be achieved by copying some operational database. This is called and offline operational database. Then, we have to feed new sets of data to the newest created data warehouse. Therefore, this database is updated with large sets of data in a regular time basis (week, month). With this step, weve successfully built an offline data warehouse.To achieve a Real-time data warehouse you have to insert the operational data in real time. When this is integrated with the application, reporting on the data, its called an integrated data warehouse.What is OLAP and OLTP, and which are their main differences?OLAP performs the analysis on the data, reporting the information. The focus on these kind of systems is the reading of data, thus using the SELECT database statement. OLTP manages the transaction system that collects the data. Actions like INSERT, UPDATE or DELETE are the focus here.This topic is covered in a much more detailed way in this OLTP and OLAP article, provided by us.What is a fact table?The fact table is a concrete measure that is typically stored as numeric values, they have the core business information.In detail, the fact table contains two different kinds of information. The foreign keys to the related dimension tables, providing joining relationships, and the measure columns which represent the added data.And a dimension table?Dimension tables describe the quantified data on the fact tables, giving context on its fields. They contain descriptive attributes which provide more information related to the fact table.Fact tables have foreign keys to the dimension ones and the relation is one to many.Describe the star schema.In the star schema we have a centralized fact table and multiple dimensions linked to it. These dimensions are only related to the fact table, so the only link they have is to that specific table.The fact table relates to the dimensions having their primary keys as foreign keys, and other extra attributes relevant to the data warehouse.Therefore this kind of schema is DE normalized andbetter for simple queries, which are usually faster.The next diagram represents a simple star schema implementation.

Describe the snowflake schema.The Snowflake schema have links and relationships between dimensions, becoming a normalized organization of tables: fact and dimensions. This type of schema is usually more complex because each dimension can be composed of many other dimensions.This kind of organization is explain in the next schema.

What is an OLAP cube?An OLAP data cube is a representation of data in multiple dimensions, using facts and dimensions. It is characterized by the combination of information according to its relationship.It can consist in a collection of 0 to many dimensions, representing specific data. There are five basic operation to perform on these kind of data cubes: Slicing Dicing Roll-Up Drill-Up and Drill-Down PivotingExplain the slicing operation.The slicing operation on an OLAP Cube establishes a single value for one of the dimensions of the cube, selecting all the data that corresponds to the selected value.So, by executing a slice on the cube we get all the selected dimension and fact information for the specific value assigned.Explain the dicing operationDicing on OLAP Cubes consists on choosing an interval of values for some of the dimensions representing in the cube, and selecting the data that corresponds to those intervals.This operation creates a subset of the cube which contains the data between the intervals.Explain the roll up operation.The roll-up operation performs some computing rules on the data of an OLAP cube specific dimension, returning the computed information to the end user.These applied rules can be defined and summarize the information on that specific dimension.Explain the drill-up/drill-down operationThese operations allow the exploration of information between the levels of data presented on dimensions and facts on the data warehouse.It can select summarized information or the details that compose that data aggregation.Explain the pivoting operationPivoting allows the rotation of the cube on its dimensions providing the user a different point of view of the explored data.The cube can be rotated on every face.Explain the concept of data mart.Data mart is a specific group of data linked to a subject, which is part of a specific data warehouse. Therefore, a data warehouse have multiple data marts.Basically a data mart is a small data warehouse with condensed information about a specific subject and its relationships. Usually each data mart is related to a department, business unit or something that can function individually within a data warehouse.Which are the reasons to create a Data Mart?There are various reasons that lead to a creation of a data mart. The most important ones are: Create a data specific environment, providing easy access to it Easy to create Data is more relevant to users having only the essential information Lower cost than creating a whole data warehouseWhat does Normalization mean?Normalization is the process in which tables and fields are organized in a database in order to reduce the redundancy of stored data. Therefore many relationships between tables are defined, providing a better organized database system.The key benefits of normalization are: Low database data redundancy Searching and indexing is faster Fewer null values since data is well distributed Cleaner and easier to maintainWhat is an ETL process?An ETL process consists on getting data from different sources and converting it to enter in a specific data warehouse.These processes transform and normalize the data, providing a common base for all sources to integrate with a data warehouse.What is aggregation?Aggregation is the representation of a set of data, joined by some aggregation function.This functions may be simple or complex depending of the purpose of the selected aggregation data. A simple function is the sum of every value.Explain what is partitioning.Partitioning is the process of dividing all data warehouse elements into smaller and distinct sets of data, keeping the relationships between the elements.The benefits of partitioning are: Easy management Better performance Availability Easier backup and recoveryWhat types of dimensions do you know?There are four common kinds of dimensions in a data warehouse: Conformed Dimension Degenerated Dimension Role-Playing Dimension Junk DimensionDescribe a conformed dimension.A conformed dimension is shared between various subjects in the data warehouse. Therefore it is widely used in different contexts, meaning the same thing in each one of them.Explain what a degenerated dimension is.The degenerated dimension is derived from a fact table and doesnt have its own dimension table.What is a role-playing dimension?A role-playing dimension has multiple applications within the same Data Warehouse and it is reused for different purposes. One example is an ID. In a data warehouse we can have several kinds of IDs: client id, product id, etc.What are junk dimensions?Junk dimensions are composed by some attributes that dont fit in another tables and are usually used with rapidly changing dimensions.What is the difference between metadata and data dictionary?A data dictionary has all the definitions of a database, the tables and fields, rows, number of rows, and that kind of information.Metadata describes some kind of information with additional and important data which is complementary.1. The full form of OLAP is

A) Online Analytical Processing

2. ......................... is a subject-oriented, integrated, time-variant, nonvolatile collection or data in support of management decisions.

B) Data Warehousing

3. The data is stored, retrieved and updated in ....................

B) OLTP

4. An .................. system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.

A) OLAP

5. ........................ is a good alternative to the star schema.

C) Fact constellation

6. The ............................ exposes the information being captured, stored, and managed by operational systems.

C) data source view

7. The type of relationship in star schema is ...............

C) one to many

8. The .................. allows the selection of the relevant information necessary for the data warehouse.

A) top-down view

9. Which of the following is not a component of a data warehouse?

D) Component Key

10. Which of the following is not a kind of data warehouse application?

D) Transaction processing

1. Data warehouse architecture is based on .........................

B) RDBMS

2. .......................... supports basic OLAP operations, including slice and dice, drill-down, roll-up and pivoting.

B) Analytical processing

3. The core of the multidimensional model is the ....................... , which consists of a large set of facts and a number of dimensions.

C) Data cube

4. The data from the operational environment enter ........................ of data warehouse.

A) Current detail data

5. A data warehouse is ......................

C) organized around important subject areas

6. Business Intelligence and data warehousing is used for ..............

D) All of the above

7. Data warehouse contains ................ data that is never found in the operational environment.

C) summary

8. ................... are responsible for running queries and reports against data warehouse tables.

C) End users

9. The biggest drawback of the level indicator in the classic star schema is that is limits ............

A) flexibility

10. ............................. are designed to overcome any limitations placed on the warehouse by the nature of the relational data model.

C) Multidimensional database

2. Data that can be modeled as dimension attributes and measure attributes are called _______ data.a) Multidimensionalb) Singledimensionalc) Measuredd) Dimensional

Answer:aExplanation:Given a relation used for data analysis, we can identify some of its attributes as measure attributes, since they measure some value, and can be aggregated upon.Dimension attribute define the dimensions on which measure attributes, and summaries of measure attributes, are viewed. 3. The generalization of cross-tab which is represented visually is ____________ which is also called as data cube.a) Two dimensional cubeb) Multidimensional cubec) N-dimensional cubed) Cuboid

Answer:aExplanation:Each cell in the cube is identified for the values for the three dimensional attributes.

4. The process of viewing the cross-tab (Single dimensional) with a fixed value of one attribute isa) Slicingb) Dicingc) Pivotingd) Both a and b

Answer:dExplanation:For eg., The item name and colour is viewed for a fixed size. 5. The operation of moving from finer-granularity data to a coarser granularity (by means of aggregation) is called a ________.a) Rollupb) Drill downc) Dicingd) Pivoting

Answer:aExplanation:The opposite operationthat of moving fromcoarser-granularity data to finer-granularity datais called a drill down. 6. In SQL the cross-tabs are created usinga) Sliceb) Dicec) Pivotd) All of the mentioned

Answer:aExplanation:pivot (sum(quantity) for color in (dark,pastel,white)) .

7. { (item name, color, clothes size), (item name, color), (item name, clothes size), (color, clothes size), (item name), (color), (clothes size), () }This can be achieved by using which of the following ?a) group by rollupb) group by cubicc) group byd) None of the mentioned

Answer:dExplanation:Group by cube is used

8. What do data warehouses support?a) OLAPb) OLTPc) OLAP and OLTPd) Operational databases

Answer:aExplanation:None .9. Select item name, color, clothes size, sum(quantity)from salesgroup by rollup(item name, color, clothes size);How many grouping is possible in this rollup?a) 8b) 4c) 2d) 1

Answer:bExplanation:{ (item name, color, clothes size), (item name, color), (item name), () } .10. Which one of the following is the right syntax for DECODE ?a) DECODE (search, expression, result [, search, result] [, default])b) DECODE (expression, result [, search, result] [, default], search)c) DECODE (search, result [, search, result] [, default], expression)d) DECODE (expression, search, result [, search, result] [, default])

Answer:dExplanation:None

Three-Tier Data Warehouse ArchitectureGenerally a data warehouses adopts a three-tier architecture. Following are the three tiers of the data warehouse architecture. Bottom Tier - The bottom tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions. Middle Tier - In the middle tier, we have the OLAP Server that can be implemented in either of the following ways. By Relational OLAP (ROLAP), which is an extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational operations. By Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional data and operations. Top-Tier - This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools.The following diagram depicts the three-tier architecture of data warehouse:

Data Warehouse ModelsFrom the perspective of data warehouse architecture, we have the following data warehouse models: Virtual Warehouse Data mart Enterprise WarehouseVirtual WarehouseThe view over an operational data warehouse is known as a virtual warehouse. It is easy to build a virtual warehouse. Building a virtual warehouse requires excess capacity on operational database servers.Data MartData mart contains a subset of organization-wide data. This subset of data is valuable to specific groups of an organization.In other words, we can claim that data marts contain data specific to a particular group. For example, the marketing data mart may contain data related to items, customers, and sales. Data marts are confined to subjects.Points to remember about data marts: Window-based or Unix/Linux-based servers are used to implement data marts. They are implemented on low-cost servers. The implementation data mart cycles is measured in short periods of time, i.e., in weeks rather than months or years. The life cycle of a data mart may be complex in long run, if its planning and design are not organization-wide. Data marts are small in size. Data marts are customized by department. The source of a data mart is departmentally structured data warehouse. Data mart are flexible.Enterprise Warehouse An enterprise warehouse collects all the information and the subjects spanning an entire organization It provides us enterprise-wide data integration. The data is integrated from operational systems and external information providers. This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or beyond.

Cubes in a data warehouse are stored in three different modes. A relational storage model is called Relational Online Analytical Processing mode or ROLAP, while a Multidimensional Online Analytical processing mode is called MOLAP. When dimensions are stored in a combination of the two modes then it is known as Hybrid Online Analytical Processing mode or HOLAP.MOLAPThis is the traditional mode in OLAP analysis. In MOLAP data is stored in form of multidimensional cubes and not in relational databases. The advantages of this mode is that it provides excellent query performance and the cubes are built for fast data retrieval. All calculations are pre-generated when the cube is created and can be easily applied while querying data. The disadvantages of this model are that it can handle only a limited amount of data. Since all calculations have been pre-built when the cube was created, the cube cannot be derived from a large volume of data. This deficiency can be bypassed by including only summary level calculations while constructing the cube. This model also requires huge additional investment as cube technology is proprietary and the knowledge base may not exist in the organization. ROLAP The underlying data in this model is stored in relational databases. Since the data is stored in relational databases this model gives the appearance of traditional OLAPs slicing and dicing functionality. The advantages of this model is it can handle a large amount of data and can leverage all the functionalities of the relational database. The disadvantages are that the performance is slow and each ROLAP report is an SQL query with all the limitations of the genre. It is also limited by SQL functionalities. ROLAP vendors have tried to mitigate this problem by building into the tool out-of-the-box complex functions as well as providing the users with an ability to define their own functions. HOLAP HOLAP technology tries to combine the strengths of the above two models. For summary type information HOLAP leverages cube technology and for drilling down into details it uses the ROLAP model. Comparing the use of MOLAP, HOLAP and ROLAPThe type of storage medium impacts on cube processing time, cube storage and cube browsing speed. Some of the factors that affect MOLAP storage are: Cube browsing is the fastest when using MOLAP. This is so even in cases where no aggregations have been done. The data is stored in a compressed multidimensional format and can be accessed quickly than in the relational database. Browsing is very slow in ROLAP about the same in HOLAP. Processing time is slower in ROLAP, especially at higher levels of aggregation. MOLAP storage takes up more space than HOLAP as data is copied and at very low levels of aggregation it takes up more room than ROLAP. ROLAP takes almost no storage space as data is not duplicated. However ROALP aggregations take up more space than MOLAP or HOLAP aggregations. All data is stored in the cube in MOLAP and data can be viewed even when the original data source is not available. In ROLAP data cannot be viewed unless connected to the data source. MOLAP can handle very limited data only as all data is stored in the cube.

1) What does the term 'Ad-hoc Analysis' mean?Choice 1 Business analysts use a subset of the data for analysis.Choice 2: Business analysts access the Data Warehouse data infrequently.Choice 3: Business analysts access the Data Warehouse data from different locations.Choice 4: Business analysts do not know data requirements prior to beginning work.Choice 5: Business analysts use sampling techniques.2) What should be the business analyst's involvement in monitoring the performance of aDataWarehouse or Data Mart?Choice 1: Be patient when load monitoring on the Data Warehouse or Data Mart is taking place.Choice 2: Become experts in SQL queries.Choice 3: No involvement in performance monitoring.Choice 4: Contact IT if a query takes too long or does not complete.Choice 5: Complete all required training on the query tools they will be using3) What factor heavily influences data warehouse size estimates?Choice 1: The design of the warehouse schemasChoice 2: The size of the source system schemasChoice 3: The record size of the source tablesChoice 4: The number of expected data warehouse usersChoice 5: The number of customers an organization has Data warehouses or data marts alloworganizations todefine 'alert' conditions -- an alert is raised when something noteworthy has taken place. Forimplementing afacility of 'alerts',4) what is the advantage of using a WEB interface over a client/server approach?Choice 1: Access to the 'Alert' report is possible through a highly accessible means alreadyavailable within theorganization.Choice 2: The selection criteria used in determining when an 'alert' needs to be issued is easier toimplementusing a WEB browser.Choice 3: As long as the appropriate individual can access the 'alert', how it is implemented doesnot present anadvantage.Choice 4:'Alerts' can be directed only to the requestor of the 'alert'.Choice 5: Access to the 'alert' data can be tightly controlled.5) Which of the following statements correctly describe a Dimension table in DimensionalModeling?Choice 1: Dimension tables contain fields that describe the facts.Choice 2: Dimension tables do not contain numeric fields.Choice 3: Dimension tables are typically larger than fact tables.Choice 4: Dimension tables do not need system-generated keys.Choice 5: Dimension tables usually have fewer fields than fact tables6) How are dimensions in a Multi-Dimensional Database related?Choice 1: Hierarchically.Choice 2: Through foreign keys.Choice 3: Through a hierchy and foreign keys.Choice 4: Through a network.Choice 5: Through an inverse list.7) What is a primary risk of a 'phased' implementation?Choice 1: Previous implementations may need to be reworked.Choice 2: The project may lose momentum.Choice 3: Business Analysts will find problems in the data sooner.Choice 4: Executives will lose focus.Choice 5: The project budget may be exceeded.28 ) How do highly distributed source systems impact the Data Warehouse or Data Martproject?Choice 1: The source data exists in multiple environments.Choice 2: The location of the source systems has minimal impact on the Data Warehouse or DataMartimplementation.Choice 3: The timing and coordination of software development, extraction, and data updates aremorecomplex.Choice 4: Large volumes of data must be moved between locations.Choice 5: Additional network and data communication hardware will be needed.9) Categories of OLAP Tools:Level 1: Basic query and display of dataLevel 2: Level 1 + advanced selection and arithmetic operationsLevel 3: Level 2 + sophisticated data analysis techniques10) Which of the following is an example of a process performed by a Level 3 OLAP tool (asdescribedabove)?Choice 1: Drill down to another level of detail.Choice 2: Display the top 10 items that meet a specific selection criteria.Choice 3: Trend analysis.Choice 4: Calculate a rolling average on a set of data.Choice 5: Display a report based on specific selection criteria.11) In a Data Mart Only architecture, what will the Data Mart Development Team(s)encounter?Choice 1: There is little or no minimal data redundancy across all of the Data Mart databases.Choice 2: Issues such as inconsistent definitions and dirty data in extracting data from multiplesource systemswill be addressed several times.Choice 3: Database design will be easier than expected because Data Mart databases supportonly a singleuser.Choice 4: There is ease in consolidating the Data Marts to create a Data Warehouse.Choice 5: It is easy to develop the data extraction system due to the use of the warehouse as asingle datasource.12) What is the primary responsibility of the 'project sponsor' during a Data Warehouseproject?Choice 1: To manage the day-to-day project activity.Choice 2: To review and approve all decisions concerning the project.Choice 3: To approve and monitor the project budget.Choice 4: To ensure cooperation and support from all 'involved' departments.Choice 5: To communicate project status to higher management and the board of directors.13) What are Metadata?Choice 1: Data used only by the IS organization.Choice 2: Information that describes and defines the organization's data.Choice 3: Definitions of data elements.Choice 4: Any business data occurring in large volumes.Choice 5: Summarized data.14) How can the managers of a department best understand the cost of their use of the datawarehouse?Choice 1: A percentage of the business department's budget should be directed to themaintenance andenhancement of the Data Warehouse.Choice 2: Institute a charge-back system of computer costs for the access to the Data Warehouse.Choice 3: Develop a training program for department management.Choice 4: Provide executive management with computer utilization reports that show whatpercentage ofutilization is due to the Data Warehouse.Choice 5: Business managers should participate in the acquisition process for computer hardwareand software.15) Which of the following is NOT a consequence of the creation of independent Data Marts?3Choice 1: Potentially different answers to a single business question if the question is asked ofmore than oneData Mart.Choice 2: Increase in data redundancy due to duplication of data between the Data Marts.Choice 3: Consistent definitions of the data in the Data Marts.Choice 4:Creation of multiple application systems that have duplicate processing due to theduplication of databetween the Data Marts.Choice 5: Increased costs of hardware as the databases in the Data Marts grow.16) What is meant by artificial intelligence when it is applied to data cleansing andtransformation tools?Choice 1: The tool can perform highly complex mathematical and statistical calculations to createderived dataelements.Choice 2: The tool can accomplish highly complex code translations when data comes frommultiple sourcesystems.Choice 3: The tool can determine through heuristics the changes needed for a set of dirty data andthen makethe changes.Choice 4: The tool can perform highly complex summarizations across multiple databases.Choice 5: The tool can identify data that appears to be inconsistent between multiple sourcesystems andprovide reporting to assist in the clean up of the source system data.17) Which of the following classes of corporations can gain the most insights from theirlegacy data?Choice 1: A corporation that wants to determine the attitude of its customers towards thecorporation.Choice 2: A corporation that offers new products and services.Choice 3: A new corporation.Choice 4: A corporation that has existed for a long time.Choice 5: A corporation that is constantly introducing new and different products and services.18) Which of the following is NOT found in an Entity Relationship Model?Choice 1: A definition for each Entity and Data Element.Choice 2: Entity Relationship DiagramChoice 3: Entity and Data Element NamesChoice 4: Fact and Dimension TablesChoice 5: Business Rules associated with the entities, entity relationships, and the data elements.19) What is Data Mining?Choice 1: The capability to drill down into an organization's data once a question has been raised.Choice 2: The setting up of queries to alert management when certain criteria are met.Choice 3: The process of performing trend analysis on the financial data of an organization.Choice 4: The automated process of discovering patterns and relationships in an organization'sdata.Choice 5: A class of tools that support the manual process of identifying patterns in largedatabases.20) What does implementing a Data Warehouse or Data Mart help reduce?Choice 1: The data gathering effort for data analysis.Choice 2: Hardware costs.Choice 3: User requests for custom reports.Choice 4: Costs when management downsizes the organization.Choice 5: All of the above.21) Profitability Analysis is one of the most common applications of data warehousing.Why isProfitability Analysis in data warehousing more difficult than usually expected?Choice 1: Almost every manager in an organization wants to get profitability reports.Choice 2: Revenue data cannot be tracked accurately.Choice 3: Expense data is often tracked at a higher level of detail than revenue data.Choice 4: Revenue data is difficult to collect and organize.Choice 5: Transaction grain data is required to properly compute profitability figures.22) Which of the following would NOT be considered a recurring cost of either DataWarehouse UserSupport or Data Warehouse Administration?Choice 1: Capacity Planning4Choice 2: Creation of New Data MartsChoice 3: Security AdministrationChoice 4: Data ArchivingChoice 5: Database Management System Software Selection23) Why is it important to track all project issues and their resolution?Choice 1: To show management what the project team has accomplished.Choice 2: Issues will be brought back up even after they have been resolved.Choice 3: Provides an audit trail for use in internal or external audits.Choice 4: There is no need to track issues once they are resolved.Choice 5: Tracking is needed for project status report.24) When a physical database design contains summary data, what must the databasedesigner alwaysensure?Choice 1: Non-numeric (non-summary) data elements should not be placed in a summary table.Choice 2: The detail data used to create the summary data is kept in case the Data Warehousedatabase needsto be reloaded.Choice 3: The level of detail lost by summarization will not affect the business analysts' use of thedata.Choice 4: Each table with summary data has a 'from' and 'to' date.Choice 5: The appropriate business rule(s) describing how the data will be summarized is in place.25) Which of the following is a business benefit of a Data Warehouse?Choice 1: Customers are happier.Choice 2: Reduction in Government interference.Choice 3: Decision makers will be able to make more decisions each day.Choice 4: Ability to identify historical trends.Choice 5; Improves morale of the business analysts.26) How does Ad-hoc Access differ from Managed Query Access?Choice 1: Ad-hoc access provides users more flexibility when retrieving data.Choice 2: Ad-hoc query access requirements are easier to anticipate.Choice 3: Managed query access is more frequently implemented.Choice 4: Managed query access give users more ways of getting the data they need.Choice 5:Managed query response times are easier to optimize.27) What is a 'snowflake' schema?Choice 1: The dimension tables are 'normalized'.Choice 2: The dimension tables can refer to more than one fact table.Choice 3: All recurring groups of attributes are completely removed from dimension tables.Choice 4: A schema that can be implemented only with an MDDB Database Management System.Choice 5: Any database implemented with a network Database Management System.28) Which of the following describes a successful decision support environment?Choice 1: Depends heavily on sets of 'canned' queries to provide good performance and reducedcostsChoice 2: Has data warehouse and data mart databases that are of terabyte sizeChoice 3: CostlyChoice 4: Totally independent of the operational systemsChoice 5: Iterative and evolutionary29) What is an Operational Data Store?Choice 1: A set of databases that serve as a 'staging' area to facilitate consolidating data fromseveral,distributed-source systems.Choice 2: A set of databases that support OLAP.Choice 3: A set of databases that support reporting from an application system.5Choice 4: A set of databases that provide integrated operations data to serve the organization'sday-to-dayactivities.Choice 5: A set of databases to provide operational data for a single department.30) When is it appropriate to 'denormalize' a relational database design for a DataWarehouse database?Choice 1: When disk space is low.Choice 2: When memory is low.Choice 3: When the analysis requirements are understood.Choice 4: Any time.Choice 5: When the database design is no longer expected to change.31) Where in the warehouse architecture is it appropriate to calculate 'derived' dataelements forstorage?Choice 1: As part of the business analysts' queries.Choice 2: In an application system developed solely to address 'derived' data elements.Choice 3: When the data are extracted from the source systems.Choice 4: After the business analysts have extracted their data from the Data Warehouse.Choice 5: Just prior to loading the data into the Data Warehouse databases32) In an architecture where 'atomic' data are maintained in the Data Warehouse and usedto create theData Marts, what is the best implementation for the Data Warehouse databases?Choice 1: Multi-Dimensional Database Management SystemChoice 2: Hierarchical Database Management SystemChoice 3: Relational Database Management SystemChoice 4: Object Database Management SystemChoice 5: Any Database Management System is acceptable.33) Why would an organization decide to implement a Data Warehouse on a mainframecomputer withits OLTP applications?Choice 1: For cost considerations only.Choice 2: For improved response time on queries to the Data Warehouse.Choice 3: The size of the Data Warehouse has outgrown the small computer's capability ofhandling it.Choice 4: To avoid large network requirements as a result of having to move large amounts ofdata betweenplatforms and database management systems.Choice 5: The number of Data Warehouse users has increased to a point where the smallerplatforms cannothandle them.34) What are the characteristics of a good candidate for a Web application?Choice 1: One that provides data in multiple formats to a small group of business analysts andmanagement.Choice 2: Any application intended to be used by executive management.Choice 3: One that provides data in multiple formats and that requires a low level of processing toa largenumber of users.Choice 4: Any application providing access to a Data Warehouse, Data Mart, or Operational DataStore.Choice 5: One that requires intensive processing and provides data in a few formats to a largenumber of users.35) What does the statement "A Data Warehouse database is non-volatile" mean?Choice 1: Data Warehouse databases contain only historical transaction data.Choice 2: Business requirements for a Data Warehouse are stable.Choice 3: Data Warehouse database structures change very infrequently.Choice 4: Data within the databases do not change from second to second.Choice 5: Data Warehouse databases support the creation of a set of reports.36) What is typically discovered when historical data are first extracted from legacysystems for initialloading into the Data Warehouse?Choice 1: Flaws in the warehouse database design.Choice 2: Flaws in the extraction program code.Choice 3: The need for additional data sources.Choice 4: Extraction run times are shorter than expected.Choice 5: Undocumented changes in the content, usage, and structure of the historical data.637) What is typically discovered when historical data are first extracted from legacy systemsfor initialloading into the Data Warehouse?Choice 1: Flaws in the warehouse database design.Choice 2: Flaws in the extraction program code.Choice 3:The need for additional data sources.Choice 4: Extraction run times are shorter than expected.Choice 5: Undocumented changes in the content, usage, and structure of the historical data.38) What is an operational system?Choice 1: An application system that tracks and manages the financial assets of the organization.Choice 2: An application system that supports the planning and forecasting within the organization.Choice 3: An application system that supports the creation of product(s) that the organizationmarkets.Choice 4: An application system that supports the organization's day-to-day activities.Choice 5:An application system that supports the organization's decision-making.39) If a Data Warehouse is to be implemented in a distributed architecture, what could bethe mostdifficult part of the implementation?Choice 1: Finding and selecting query and reporting tools that can span multiple databases.Choice 2: Finding and selecting the tools to monitor database performance.Choice 3: Convincing the business analysts that this approach will work.Choice 4: Developing an estimated Data Warehouse workload.Choice 5: Designing the Data Warehouse databases.40) What is a primary risk of a 'phased' implementation?Choice 1: Previous implementations may need to be reworked.Choice 2: The project may lose momentum.Choice 3: Business Analysts will find problems in the data sooner.Choice 4: Executives will lose focus.Choice 5: The project budget may be exceeded.41) Data warehouses or data marts allow organizations to define 'alert' conditions -- an alertis raisedwhen something noteworthy has taken place. For implementing a facility of 'alerts', what istheadvantage of using a WEB interface over a client/server approach?Choice 1: Access to the 'Alert' report is possible through a highly accessible means alreadyavailable within theorganization.Choice 2:The selection criteria used in determining when an 'alert' needs to be issued is easier toimplementusing a WEB browser.Choice 3: As long as the appropriate individual can access the 'alert', how it is implemented doesnot present anadvantage.:Choice 4: 'Alerts' can be directed only to the requestor of the 'alert'.Choice 5: Access to the 'alert' data can be tightly controlled42) What should be the business analyst's involvement in monitoring the performance of aDataWarehouse or Data Mart?Choice 1: Be patient when load monitoring on the Data Warehouse or Data Mart is taking place.Choice 2: Become experts in SQL queries.Choice 3: No involvement in performance monitoring.Choice 4: Contact IT if a query takes too long or does not complete.Choice 5: Complete all required training on the query tools they will be using.

1: Data dictionary isA.Large collection of data mostly stored in a computer systemB.The removal of noise errors and incorrect input from a databaseC.The systematic description of the syntactic structure of a specific database. It describes the structure of the attributes the tables and foreign key relations hips.D.None of theseOption: C

2: Data warehouse isA.The actual discovery phase of a knowledge discovery processB.The stage of selecting the right data for a KDD processC.A subject-oriented integrated time variant non-volatile collection of data in support of managementD.None of these

Answer Report Discuss

Option: C

Explanation :

3:

Data cleaning is

A.

Large collection of data mostly stored in a computer system

B.

The removal of noise errors and incorrect input from a database

C.

The systematic description of the syntactic structure of a specific database. It describes the structure of the attributes the tables and foreign key relationships.

D.

None of these

Answer Report Discuss

Option: B

Explanation :

4:

Decision support systems (DSS) is

A.

A family of relational database management systems marketed by IBM

B.

Interactive systems that enable decision makers to use databases and models on a computer in order to solve ill-structured problems

C.

It consists of nodes and branches starting from a single root node. Each node represents a test, or decision

D.None of these

Option: B