Transcript

Master of Business Administration - MBA Semester III MI0036 – Business Intelligence Tools - 4 Credits Assignment - Set- 1 (60 Marks)

Q1. Define the term business intelligence tools? Briefly explain how the data from one end gets transformed into information at the other end?

Ans: Business intelligence tools. The various tools of this suite are:

Data Integration Tools: These tools extract, transform and load the data from the source databases to the target database. There are two categories; Data Integrator and Rapid Marts. Data Integrator is an ETL tool with a GUI. Rapid Marts is a packaged ETL with pre-built data models for reporting and query analysis that makes initial prototype development easy and fast for ERP applications.

    The important components of Data Integrator include;

Graphicaldesigner: This is a GUI used to build and test ETL jobs for data cleansing, validation and auditing.

Data integration server: This integrates data from different source databases.

Metadata repository: This repository keeps source and target metadata and the transformation rules.

Administrator: This is a web-based tool that can be used to start, stop, schedule and monitor ETL jobs.

BI Platform: This platform provides a set of common services to deploy, use and manage the tools and applications. These services include providing the security, broadcasting, collaboration, metadata and developer services.

Reporting Tools and Query & Analysis Tools: These tools provide the facility for standard reports generation, ad hoc queries and data analysis.

Performance Management Tools: These tools help in managing the performance of a business by analyzing and tracking key metrics and goals.

Business intelligence tools are a type of application software designed to help in making better business decisions. These tools aid in the analysis and presentation of data in a more meaningful way and so play a key role in the strategic planning process of an organization. They illustrate business intelligence in the areas of market research and segmentation, customer profiling, customer support, profitability, and inventory and distribution analysis to name a few.

Various types of BI systems viz. Decision Support Systems, Executive Information Systems (EIS), Multidimensional Analysis software or OLAP (On-Line Analytical Processing) tools, data mining tools are discussed further. Whatever is the type, the Business Intelligence capabilities of the system is to let its users slice and dice the

information from their organization’s numerous databases without having to wait for their IT departments to develop complex queries and elicit answers.

Although it is possible to build BI systems without the benefit of a data warehouse, most of the systems are an integral part of the user-facing end of the data warehouse in practice. In fact, we can never think of building a data warehouse without BI Systems. That is the reason; sometimes, the words ‘data warehousing’ and ‘business intelligence’ are being used interchangeably.

Figure 1.1 depicts how the data from one end gets transformed to information at the other end for business information.

Q2. What do you mean by data ware house? What are the major concepts and terminology used in the study of data warehouse?

Ans: In simple terms, a data warehouse is the repository of an organization’s historical data (also termed as the corporate memory). For example, an organization would get the information that is stored in its data warehouse to find out what day of the week they sold the most number of gadgets in May 2002, or how employees were on sick leave for a specific week.

A data warehouse is a database designed to support decision making in an organization. Here, the data from various production databases are copied to the data warehouse so that queries can be forwarded without disturbing the stability or performance of the production systems. So the main factor that leads to the use of a data warehouse is that complex queries and analysis can be obtained over the information without slowing down the operational systems. While operational systems are optimized for simplicity and speed of modification (online transaction processing, or OLTP), the data warehouse is optimized for reporting and analysis (online analytical processing, or OLAP). (The concepts of OLTP and OLAP are discussed in later Units).

Apart from traditional query and reporting, a data warehouse provides the base for the powerful data analysis techniques such as data mining and multidimensional analysis (discussed in detail in later Units). Making use of these techniques will result in easier access to the information you need for informed decision making.

Characteristics of a Data Warehouse

According to Bill Inmon, who is considered to be the Father of Data warehousing, the data in a Data Warehouse consists of the following characteristics:

Subject oriented

The first feature of DW is its orientation toward the major subjects of the organization instead of applications. The subjects are categorized in such a way that the subject-wise collection of information helps in decision-making. For example, the data in the data warehouse of an insurance company can be organized as customer ID, customer name, premium, payment period, etc. rather auto insurance, life insurance, fire insurance, etc.

Integrated

The data contained within the boundaries of the warehouse are integrated. This means that all inconsistencies regarding naming convention and value representations need to be removed in a data warehouse. For example, one of the applications of an organization might code gender as ‘m’ and ‘f’ and the other application might code the same functionality as ‘0′ and ‘1′. When the data is moved from the operational environment to the data warehouse environment, this will result in conflict.

Time variant

The data stored in a data warehouse is not the current data. The data is a time series data as the data warehouse is a place where the data is accumulated periodically. This is in contrast to the data in an operational system where the data in the databases are accurate as of the moment of access.

Non-volatility of the data

The data in the data warehouse is non-volatile which means the data is stored in a read-only format and it does not change over a period of time. This is the reason the data in a data warehouse forms as a single source for all decision system support processing.

Keeping the above characteristics in view, ‘data warehouse‘can be defined as a subject-oriented, integrated, non-volatile, time-variant collection of data designed to support the decision-making requirements of an organization.

Q 3. What are the data modeling techniques used in data warehousing environment?

Ans: There are two data modeling techniques that are relevant in a data warehousing environment. They are Entity Relationship modeling (ER modeling) and dimensional modeling.

ER modeling produces a data model of the specific area of interest, using two basic concepts: Entities and the Relationships between them. A detailed ER model may also contain attributes, which can be properties of either the entities or the relationships. The ER model is an abstraction tool as it can be used to simplify, understand and analyze the ambiguous data relationships in the real business world.

Dimensional modeling uses three basic concepts: Facts, Dimensions and Measures. Dimensional modeling is powerful in representing the requirements of the business user in the context of database tables and also in the area of data warehousing.

Both ER and dimensional modeling can be used to create an abstract model of a specific subject. However, each of them has its own limited set of modeling concepts and associated notation conventions. Consequently, the techniques seem different, and they are indeed different in terms of semantic representation. There is much debate as to which method is better and the conditions under which a specific technique is to be selected. There can be no definite answer, understanding of the circumstances and the business requirements finally lead to selection of an appropriate technique.

Entity- Relationship (E-R) Modeling

Basic Concepts

An ER model is represented by an ER diagram, which uses three basic graphic symbols to conceptualize the data: entity, relationship, and attribute.

Entity

An entity is defined to be a person, place, thing, or event of interest to the business or the organization. It represents a class of objects, which are things in the real business world that can be observed and classified by their properties and characteristics. In general, an entity has its own business definition and a clear boundary definition that is required to describe what is included and what is not.

In a practical modeling project, the team members share a definition template for integration and a consistent entity definition in the model. In case of a high-level business modeling, an entity can be very generic, but it must be quite specific in the detailed logical modeling. There are four entities; PRODUCT, PRODUCT MODEL, PRODUCT COMPONENT, and COMPONENT in the ER diagram (Refer Figure 4.1) and are represented as rectangles.

Fig. 4.1: A Simple ER Model

The four diagonal lines on the corners of the PRODUCT COMPONENT entity represent that the entity is ‘an associative entity’ and the entity is to resolve the many-to-many relationship between two entities. PRODUCT MODEL and COMPONENT are independent of each other

but have a business relationship between them. A PRODUCT MODEL consists of many components and a component is related to many product models. With this business rule, you cannot tell which components make up a product model. To do this, you can define a resolving entity. For example, the PRODUCT COMPONENT entity can provide the information about which components are related to which product model.

In ER modeling, naming the entities is important for easy understanding and clear communication. It is expressed grammatically in the form of a noun rather than a verb and the criteria for selecting an entity name depend on how well the name represents the characteristics and scope of the entity. Also, defining a unique identifier of an entity is the most critical task. These unique identifiers are called candidate keys. Among them, you can select the key that is most commonly used to identify the entity, called ‘primary key’.

Relationship

Relationships represent the structural interaction and association among the entities in a model and they are represented with lines drawn between the two specific entities. Generally, a relationship is named grammatically by a verb (such as owns, belongs, and has) and the relationship between the entities can be defined in terms of the cardinality. Cardinality represents the maximum number of instances of one entity that are related to a single instance in another table and vice versa. Thus the possible cardinalities include one-to-one (1:1), one-to-many (1:M), and many-to-many (M:M). In a detailed normalized ER model, any M:M relationship is not shown because it is resolved to an associative entity.

Attributes

Attributes describe the characteristics of properties of the entities. The Product ID, Description, and Picture are attributes of the PRODUCT entity in Figure 4.1. The name of an attribute has to be unique in an entity and should be self-explanatory to ensure clarity. For example, rather naming date1 and date2, you may use the names; order date and delivery date. When an instance has no value for an attribute, the minimum cardinality of the attribute is zero, which means either nullable or optional.

In Figure 4.1, you can see the characters P, m, o, and F that stand for primary key, mandatory, optional, and foreign key. The Picture attribute of the PRODUCT entity is optional, which means it is nullable. A foreign key of an entity is defined to be the primary key of another entity. In figure 4.1, the Product ID attribute of the PRODUCT MODEL entity is a foreign key as it is the primary key of the PRODUCT entity. These foreign keys are useful in determining the relationships such as the referential integrity between the entities.

Other Concepts

Supertype and Subtype

An entity can have subtypes and supertypes and the relationship between a supertype entity and its subtype entity is an IS A relationship. An IS A relationship is used where one entity is a generalization of several more specialized entities. The supertype and subtype relationship is represented by a triangle on the relationship. Figure 4.2 shows an example of supertype and subtype entities wherein SALES OUTLET is the supertype of RETAIL STORE and CORPORATE SALES OFFICE and RETAIL STORE, CORPORATE SALES OFFICE are

subtypes of SALES OUTLET. Here, each subtype entity inherits attributes from its supertype entity.

Also, each subtype entity can have its own distinctive attributes. In the example provided above, Region ID and Outlet ID are inherited attributes and the sub entities have their own attributes (such as number of cash registers and floor space of the RETAIL STORE subentity). The practical benefit of supertyping and subtyping is that they make a data model more directly expressive. Just by looking at the ER diagram, you can see that sales outlets are composed of ‘retail stores’ and ‘corporate sales offices’.

Fig. 4.2: Supertype and Subtype

 

Other important concepts in the area of ER modeling are ‘domain’ and ‘normalization’.

A domain consists of all the possible acceptable values and categories that are allowed for an attribute. It is the set of all real possible occurrences. The format or data type, such as integer, date, and character, provides a clear definition of domain. The practical benefit of domain is that it is imperative for building the data dictionary or repository, and for implementing the database consequently.

Normalization is a process of assigning the attributes to entities which in a way reduces data redundancy, avoids data anomalies, provides a solid architecture for updating data, and reinforces the long-term integrity of the data model (the third normal form is usually adequate).

Dimensional Modeling

Dimensional modeling is a relatively new concept compared to ER modeling. This method is simpler, more expressive, and easier to understand. This technique is mainly aimed at conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. It is useful for summarizing and rearranging the data and presenting views of the data to support data analysis. Also, the technique focuses on numeric data, such as values, counts, and weights.

Q 4. Discuss the categories in which data is divided before structuring it into data ware house?

Ans: The Data Warehouses can be divided into two types:

Enterprise Data Warehouse Data Mart

Enterprise Data Warehouse

The Enterprise data warehouse consists of the data drawn from multiple operational systems of an organization. This data warehouse supports time-series and trend analysis across different business areas of an organization and so can be used for strategic decision-making. Also, this data warehouse is used to populate various data marts.

Data Mart

As data warehouses contain larger amounts of data, organizations often create ‘data marts’ that are precise, specific to a department or product line. Thus data mart is a physical and logical subset of an Enterprise data warehouse and is also termed as a department-specific data warehouse. Generally, data marts are organized around a single business process.

There are two types of data marts; independent and dependant. The data is fed directly from the legacy systems in case of an independent data mart and the data is fed from the enterprise data warehouse in case of a dependent data mart. In the long run, the dependent data marts are much more stable architecturally than the independent data marts.

Advantages and Limitations of a DW System

Use of a data warehouse brings in the following advantages for an organization:

End-users can access a wide variety of data. Management can obtain various kinds of trends and patterns of data. A warehouse provides competitive advantage to the company by providing the data and

timely information. A warehouse acts as a significant enabler of commercial business applications viz., Customer

Relationship Management (CRM) applications.

However, following are the concerns that one has to keep in mind while using a data warehouse:

The scope of a Data warehousing project is to be managed carefully to attain the defined content and value.

The process of extracting, cleaning and loading the data and finally storing it into a data warehouse is a time-consuming process.

The problems of compatibility with the existing systems need to be resolved before building a data warehouse.

Security of the data may become a serious issue, especially if the warehouse is web accessible.

Building and maintenance of the data warehouse can be handled only through skilled resources and requires huge investment.

 

Data Warehouse Concepts and Terminology

Various concepts and the key terms used in the study of data warehouse are provided below.

Dashboard:

    This is a reporting tool that consolidates aggregates and arranges measurements, metrics (measurements compared to a goal) on a single screen so that information can be monitored at a glance.

 

 

Data Management:

    This is the process of controlling, protecting, and facilitating access to data in order to provide the end users with timely access to the data they need.

Data Mining (or Data Surfing):

    This is a technique geared for the typical user who does not know exactly what he is searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. It can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The most valuable results from data mining include clustering, classifying, and estimating the things that occur together. There are many kinds of tools that play a role in data mining and they include neural networks, decision trees, visualization, general algorithms, fuzzy logic, etc.

Data Modeling:

    A method used to define and analyze data requirements needed to support the business functions of an organization.

Data Profiling:

    Data Profiling is a critical step in data migration that automates the identification of problematic data and metadata, and enables organizations to correct inconsistencies, redundancies and inaccuracies in their databases.

Data Visualization:

    Data visualization involves examining the data represented by dynamic images rather than pure numbers. These are the techniques that turn the data into information by using the high capacity of the human brain to visually recognize patterns and trends.

 

Decentralized Warehouse:

    A remote data source that users can query/access via a central gateway that provides a logical view of corporate data in terms that users can understand. The gateway parses and distributes queries in real time to remote data sources and returns result sets back to users.

Drill-down:

    This is the capacity to browse the information through a hierarchical structure as shown below.

External Data Source:

    This is the data that is not available in the OLTP systems, but is required to enhance the information quality in the data warehouse. The examples of this data include the data of the competitors, information of the regulatory and government bodies, research data of the professional bodies and universities.

Metadata:

    Metadata is data about data. The examples of metadata include data element descriptions, data type descriptions, attribute descriptions, and process descriptions.

 On-Line Analytical Processing (OLAP):

    This is a category of software technology that enables the users gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the organization. This is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. This software is also called Multidimensional Analysis Software.

On-Line Transaction Processing (OLTP):

    This is the way the data is processed by an end user/a computer system. Here, the data is detail oriented, highly repetitive with larger amounts of updates and changes. The major task of these systems is to perform on-line transaction and query processing. These systems cover most of the day-to-day operations of the organization, such as purchasing, inventory, manufacturing, payroll, banking, accounting and registration.

Operational Databases:

    These are detail oriented databases defined to meet the needs of complex processes of an organization. Here, the data is highly normalized to avoid data redundancy and double-maintenance. A large number of transactions take place every hour on these databases and are always “up to date” and represent a snapshot of the current situation. Contrast to these databases, there are Informational databases that are stable over a period of time to represent a situation at a specific point in time in the past.

Architecture of a Data Warehouse

The architecture describes the overall system of a Data Warehouse from various perspectives such as data, process, and infrastructure to study the inter-relationships among various components.

The data perspective includes the source and target data structures and so it aids the user in understanding what data assets are available in a data warehouse and how they are related.

The process perspective is primarily concerned with communicating the process and flow of data from the originating source system through the process of loading the data warehouse and extracting data from the warehouse.

The infrastructure or technology perspective details the various hardware and software products used to implement the distinct components of the overall system.

Depending upon the specifics of an organizational situation, the following types of Data Warehouse architectures are provided below:

Basic architecture of a Data warehouse Architecture of a Data warehouse with Staging area Architecture of a Data warehouse with Staging area and Data marts

    Fig 2.1 shows a simple architecture of a data warehouse wherein the end users directly access the data derived from several source systems through the data warehouse.

Q 5. Discuss the purpose of executive information system in an organization?

Ans: An Executive Information System (EIS) is a set of management tools supporting the information and decision-making needs of management by combining information available within the organisation with external information in an analytical framework.

EIS are targeted at management needs to quickly assess the status of a business or section of business.  These packages are aimed firmly at the type of business user who needs instant and up to date understanding of critical business information to aid decision making.

The idea behind an EIS is that information can be collated and displayed to the user without manipulation or further processing. The user can then quickly see the status of his chosen department or function, enabling them to concentrate on decision making. Generally an EIS is configured to display data such as order backlogs, open sales, purchase order backlogs, shipments, receipts and pending orders. This information can then be used to make executive decisions at a strategic level.

The emphasis of the system as a whole is the easy to use interface and the integration with a variety of data sources. It offers strong reporting and data mining capabilities which can provide all the data the executive is likely to need. Traditionally the interface was menu driven with either reports, or text presentation. Newer systems, and especially the newer Business Intelligence systems, which are replacing EIS, have a dashboard or scorecard type display.

Before these systems became available, decision makers had to rely on disparate spreadsheets and reports which slowed down the decision making process. Now massive amounts of relevant information can be accessed in seconds. The two main aspects of an EIS system are integration and visualisation. The newest method of visualisation is the Dashboard and Scorecard. The Dashboard is one screen that presents key data and organisational information on an almost real time and integrated basis. The Scorecard is another one screen display with measurement metrics which can give a percentile view of whatever criteria the executive chooses.

Behind these two front end screens can be an immense data processing infrastructure, or a couple of integrated databases, depending entirely on the organisation that is using the system. The backbone of the system is traditional server hardware and a fast network. The EIS software itself is run from here and presented to the executive over this network. The databases needs to be fully integrated into the system and have real-time connections both in and out. This information then needs to be collated, verified, processed and presented to the end user, so a real-time connection into the EIS core is necessary.

Executive Information Systems come in two distinct types: ones that are data driven, and ones that are model driven. Data driven systems interface with databases and data warehouses. They collate information from different sources and presents them to the user in an integrated dashboard style screen. Model driven systems use forecasting, simulations and decision tree like processes to present the data.

As with any emerging and progressive market, service providers are continually improving their products and offering new ways of doing business. Modern EIS systems can also present industry trend information and competitor behaviour trends if needed. They can filter and analyse data; create graphs, charts and scenario generations; and offer many other options for presenting data.

There are a number of ways to link decision making to organisational performance. From a decision maker's perspective these tools provide an excellent way of viewing data. Outcomes displayed include single metrics, trend analyses, demographics, market shares and a myriad

of other options. The simple interface makes it quick and easy to navigate and call the information required.

For a system that seems to offer business so much, it is used by relatively few organisations. Current estimates indicate that as few as 10% of businesses use EIS systems. One of the reasons for this is the complexity of the system and support infrastructure. It is difficult to create such a system and populate it effectively. Combining all the necessary systems and data sources can be a daunting task, and seems to put many businesses off implementing it. The system vendors have addressed this issue by offering turnkey solutions for potential clients. Companies like Actuate and Oracle are both offering complete out of the box Executive Information Systems, and these aren't the only ones. Expense is also an issue. Once the initial cost is calculated, there is the additional cost of support infrastructure, training, and the means of making the company data meaningful to the system.

Does EIS warrant all of this expense? Green King certainly thinks so. They installed a Cognos system in 2003 and their first few reports illustrated business opportunities in excess of £250,000. The AA is also using a Business Objects variant of an EIS system and they expect a return of 300% in three years. (Guardian 31/7/03)

An effective Executive Information System isn't something you can just set up and leave it to do its work. Its success depends on the support and timely accurate data it gets to be able to provide something meaningful. It can provide the information executives need to make educated decisions quickly and effectively. An EIS can provide a competitive edge to business strategy that can pay for itself in a very short space of time.

Q6. Discuss the challenges involved in data integration and coordination process?

Ans: In general, most of the data that the warehouse gets is the data extracted from a combination of legacy mainframe systems, old minicomputer applications, and some client/server systems. But these source systems do not conform to the same set of business rules. Thus they may often follow different naming conventions and varied standards for data representation. Thus the process of data integration and consolidation plays a vital role. Here, the data integration includes combining of all relevant operational data into coherent data structures so as to make them ready for loading into data warehouse. It standardizes the names and data representations and resolves the discrepancies. Some of the challenges involved in the data integration and consolidation process are as follows.

Identification of an Entity

Suppose there are three legacy applications that are in use in your organization; one is the order entry system, second is customer service support system, and the third is the marketing system. Each of these systems might have their own customer file to support the system. Even most of the customers will be common to all these three files, the same customer on each of these files have a different unique identification number.

As you need to keep a single record for each customer in a data warehouse, you need to get the transactions of each customer from various source systems and then match them up to load into the data warehouse. This is an entity identification problem in which you do not

know which of the customer records relate to the same customer. This problem is prevalent where multiple sources exist for the same entities and the other entities that are prone to this type of problem include vendors, suppliers, employees, and various products manufactured by a company.

In case of three customer files, you have to design complex algorithms to match records from all the three files and groups of matching records. But this is a difficult exercise. If the matching criterion is too tight, then some records might escape the groups. Similarly, a particular group may include records of more than one customer if the matching criterion designed is too loose. Also, you might have to involve your users or the respective stakeholders to understand the transaction accurately. Some of the companies attempt this problem in two phases. In the first phase, the entire records, irrespective whether they are duplicates or not, are assigned unique identifiers and in the second phase, the duplicates are reconciled periodically ether through automatic algorithms or manually.

Existence of Multiple Sources

Another major challenge in the area of data integration and consolidation results from a single data element having more than one source. For instance, cost values are calculated and updated at specific intervals in the standard costing application. Similarly, your order processing application also carries the unit costs for all products. Thus there are two sources available to obtain the unit cost of a product and so there could be a slight variation in their values. Which of these systems needs to be considered to store the unit cost in the data warehouse becomes an important question. One easy way of handling this situation is to prioritize the two sources, or you may select the source on the basis of the last update date.

Implementation of Transformation

The implementation of data transformation is a complex exercise. You may have to go beyond the manual methods, usual methods of writing conversion programs while deploying the operational systems. You need to consider several other factors to decide the methods to be adopted. Suppose you are considering automating the data transformation functions, you have to identify, configure and install the tools, train the team on these tools, and integrate them into the data warehouse environment. But a combination of both methods proves to be effective. The issues you may face in using manual methods and transformation tools are discussed below.

Manual Methods

These are the traditional methods that are in practice in the recent past. These methods are adequate in case of smaller data warehouses. These methods include manually coded programs and scripts that are mainly executed in the data staging area. Since these methods call for elaborate coding and testing and programmers and analysts who posses the specialized knowledge in this area only can produce the programs and scripts.

Although the initial cost may be reasonable, ongoing maintenance may escalate the cost while implementing these methods. Moreover these methods are always prone to errors. Another disadvantage of these methods is about the creation of metadata. Even if the in-house programs record the metadata initially, the metadata needs to be updated every time the changes occur in the transformation rules.

Transformation Tools

The difficulties involved in using the manual methods can be eliminated using the sophisticated and comprehensive set of transformation tools that are now available. Use of these automated tools certainly improves efficiency and accuracy. If the inputs provided into the tools are accurate, then the rest of the work is performed efficiently by the tool. So you have to carefully specify the required parameters, the data definitions and the rules to the transformation tool.

Also, the transformation tools enable the recording of metadata. When you specify the transformation parameters and rules, these values are stored as metadata by the tool and this metadata becomes a part of the overall metadata component of the data warehouse. When changes occur to business rules or data definitions, you just have to enter the changes into the tool and the metadata for the transformations get adjusted automatically. But relying on the transformation tools alone without using the manual methods is also not practically possible.

Transformation for Dimension Attributes

Now we consider the updating of the dimension tables. The dimension tables are more stable in nature and so they are less volatile compared to the fact tables. The fact tables change through an increase in the number of rows, but the dimension tables change through the changes to the attributes. For instance, we consider a product dimension table. Every year, rows are added as new models become available. But what about the attributes that are within the dimension table. You might face a situation where there is a change in the product dimension table because a particular product was moved into a different product category. So the corresponding values must be changed in the product dimension table. Though most of the dimensions are generally constant over a period of time, they may change slowly.


Top Related