Download - Data Warehousing

Transcript
Page 1: Data Warehousing

SEMINAR ON

DATA WAREHOUSING

PRESENTED BY

T.SURESH, REGD.NO: Y1MC09053

Page 2: Data Warehousing

CONTENTS

1. What is Data Warehousing?2. Why to build a Data Warehouse?3. Microsoft Data warehousing framework.4. Data warehousing framework components.5. Data warehousing architecture.6. Data Marts.7. Building a Data Warehouse from Data Marts.8. OLAP.9. Integrated OLAP Analytical capabilities.10. Data Warehousing and OLAP.11. Data Warehousing components.12. Data Warehousing characteristics.13. Designing and building a Data Warehouse.14. Data Granularity.15. Building the OLAP Data model.16. Process flow within a data warehouse.17. Data Warehousing and Online Analytical Processing.

Page 3: Data Warehousing

What is Data Warehousing?

Online analytical processing (OLAP) and Data mining represents some of the latest trends in computing environment and I.T applications to large scale processing and analysis of data. Data Warehouse along with OLAP tools are being increasingly developed to analyze historical data to identify past patterns or trends which may be useful in forecasting future.

A data warehouse is an integrated store of information collected from other systems that becomes the foundation for decision support and data analysis. Although there are many types of data warehouses, based on different design methodologies and philosophical approaches, they all have these common traits:

Information is organized around the major subjects of the enterprise (for example, customers, products, sales, or vendors), reflecting a data-driven design.

Raw data is gathered from nonintegrated operational and legacy applications, cleansed, and then summarized and presented in a way that makes sense to end users.

Based on feedback from end users and discoveries in the data warehouse, the data warehouse architecture will change over time, reflecting the iterative nature of the process.

The data warehousing process is inherently complex and, as a result, is costly and time-consuming.

A copy of the data in a database, created specifically to allow users to query the database. Creating a data warehouse allows you to structure the data for easiest reporting and for best security–for example, pre-joining tables to assist with queries by novice users, or removing fields that contain sensitive data. It also lets you move the data to another server to minimize the performance impact that queries can cause. Because data in a data warehouse is a snapshot of a database, it must be refreshed periodically; the exact interval depends on your application needs.

Why to build a data warehouse: Business strategy requires answers to questions in business and future strategy. This means that the decisions are to be taken quickly and correctly using all the available data. As the data size increases continuosly and data is required to

Page 4: Data Warehousing

be processsed faster and faster the need for data warehousing technology arises in terms of ability to organise ,maintain large data and also be able to analyse in few seconds in the manner and depth required.There are conventional information systems which didn’t succeed in meeting these requirements.Conventional and data warehousing tackle two different activity domains-OLTP & OLAP. Besides this conventional information system is not capable of analyzing a large number of past transactions or large number of data records. The cost of processing the data also increases as the volume of data increases .As a result the analyst finds data mart extremely useful for fast and easy analysis. Since the data flows into data mart (from the Data Warehouse) ,the department which owns it can easily customize the data. Data marts are of 2 types:1.Multi dimensional OLAP (MOLAP) 2.Relational OLAP (ROLAP)

Data Warehousing FrameworkThe goal of the Data Warehousing Framework is to simplify the design, implementation, and management of data warehousing solutions. This framework has been designed to provide:

Open architecture that is easily integrated with and extended by third-party vendors.

Heterogeneous data import, export, validation, and cleansing services with optional data lineage.

Integrated metadata for data warehouse design, data extraction/transformation, server management, and end-user analysis tools.

Core management services for scheduling, storage management, performance monitoring, alerts/events, and notification.

The Data Warehousing Framework has been designed from the ground up to provide an open architecture that can be extended easily by Microsoft customers and third-party businesses using industry-standard technology. This allows organizations to choose best-of-breed components and still be assured of integration.Ease of use is a compelling reason for customers and independent software vendors (ISVs) to choose the Data Warehousing Framework. Microsoft provides an object-oriented set of components designed to manage information in the distributed environment. Microsoft also provides both entry-level and best-of-breed products to address the many steps in the data warehousing process.

Page 5: Data Warehousing

Determining Business, User, and Technical RequirementsBefore a data warehouse can be built, a detailed project and implementation plan should be written. The project and implementation plan includes:

Building a business case. Gathering user requirements. Determining the technical requirements. Defining standard reports required by users. Analyzing client application tools being used.

Building the business case is common at the beginning of any project. It involves determining the business needs solved by the project, the costs of the project, and the return on the investment.Gathering user requirements largely involves interviewing the intended users of the data warehouse. The user requirements determine:

Data requirements (level of granularity). Operational systems within the enterprise containing the

data. Business rules followed by the data. Queries required to provide the users with data.

The technical requirements may involve determining: Hardware architecture and infrastructure (for example,

links to remote geographical regions where data marts might be located).

Backup and recovery mechanisms. Security guidelines. Methods of loading and transforming data from

operational systems to the data warehouse. Standard reports required by users should be analyzed to determine the tables, columns, and selection criteria necessary to create the reports, and the frequency in which they are generated. Provisions should also be made for expanding or modifying the scope of reports as required.Client application tools should be analyzed to determine if they can provide enhanced processing capabilities that help in processing data, performing queries, or generating reports.

Data Warehousing Framework ComponentsBuilding the data warehouse requires a set of components for describing the logical and physical design of the data sources and their destinations in the enterprise data warehouse or data mart. To conform to definitions laid out during the design stage, operational data must pass through a cleansing and transformation stage before being placed in the enterprise data warehouse or data mart. This data staging process can

Page 6: Data Warehousing

be many levels deep, especially with enterprise data warehousing architectures, but is necessarily simplified in this illustration.

End-user tools, including desktop productivity products, specialized analysis products, and custom programs, are used to gain access to information in the data warehouse. Ideally, user access is through a directory facility that enables end-user searches for appropriate and relevant data to resolve questions and that provides a layer of security between the end users and the data warehouse systems.Finally, a variety of components can come into play for the management of the data warehousing environment, such as for scheduling repeated tasks and managing multiserver networks. The Data Warehousing Framework describes the relationships between the various components used in the process of building, using, and managing a data warehouse. Two enabling technologies comprise the core Data Warehousing Framework: the integrated metadata repository and the data transport layer (OLE DB). These technologies make possible the interoperability of many products and components involved in data warehousing.OLE DB provides for standardized, high-performance access to a wide variety of data, and allows for integration of multiple data types.Microsoft Repository provides an integrated metadata repository that is shared by the various components used in the data warehousing process. Shared metadata allows for the transparent integration of multiple products from a variety of vendors, without the need for specialized interfaces between each of the products:

Mainframe indexed sequential access method/virtual storage access method (ISAM/VSAM) and hierarchical databases

E-mail and file system stores Text, graphical, and geographical data Custom business objects

OLE DB defines a collection of COM interfaces that encapsulates various database management system services. These interfaces enable the creation of software components that implement such services. OLE DB

Page 7: Data Warehousing

components consist of data providers (which contain and expose data), data consumers (which use data), and service components (which process and transport data).OLE DB interfaces are designed to integrate components smoothly so that vendors can bring high-quality OLE DB components to the market quickly. In addition, OLE DB includes a bridge to ODBC that enables continued support for the broad range of ODBC relational database drivers available today.

Data Warehousing Architecture

Many methodologies have been proposed to simplify the information technology efforts required to support the data warehousing process on an ongoing basis. This has led to debates about the best architecture for delivering data warehouses in organizations.Two basic types of data warehouse architecture exist: enterprise data warehouses and data marts.The enterprise data warehouse contains enterprise-wide information integrated from multiple operational data sources for consolidated data analysis. Typically, it is composed of several subject areas, such as customers, products, and sales, and is used for both tactical and strategic decision making. The enterprise data warehouse contains both detailed point-in-time data and summarized information, and can range in size from 50 gigabytes (GB) to more than 1 terabyte. Enterprise data warehouses can be very expensive and time-consuming to build and manage. They are usually created from the top down by centralized information services organizations.The data mart contains a subset of enterprise-wide data that is built for use by an individual department or division in an organization. Unlike the enterprise data warehouse, the data mart is usually built from the bottom up by departmental resources for a specific decision-support application or group of users. Data marts contain summarized and often detailed data about a subject area. The information in the data mart can be a subset of an enterprise data warehouse (dependent data mart) or can come directly from the operational data sources (independent data mart).Enterprise data warehouses and data marts are constructed and maintained through the same iterative process described earlier. Furthermore, both approaches share a similar set of technological components.

The understandable preference for, and proliferation of, data marts instead of enterprise data warehouses is causing a future systems integration issue for some organizations. Creation of a data warehousing architecture for the enterprise saves each data mart project team from having to recreate the elements of the architecture. If an organization's structure is one that may require future data mart integration, it is less work to plan and architect for this before the data marts are created than after the fact.

Page 8: Data Warehousing

It behooves organizations that will operate multiple data marts to establish an information architecture or model. The goal of the data warehousing architecture is to identify the following common items:

source systems for data to populate the data marts

business dimensions

semantics (data definitions, measurements, and aggregation formulas)

If an organization requires, or will require, that multiple data marts each download similar data from production source systems, then the most maintainable architecture may be to move data first from the source systems to a well designed enterprise data warehouse or operational data store. Then, each data mart can be populated from the enterprise data warehouse, inheriting in the process all of the architectural elements that were implemented in the enterprise data warehouse. This approach also reduces the number of interface links between data marts and source systems that must be maintained.

Data Marts A data mart is typically defined as a subset of the contents of a data warehouse, stored within its own database. A data mart tends to contain data focused at the department level, or on a specific business area. The data can exist at both the detail and summary levels. The data mart can be populated with data taken directly from operational sources, similar to a data warehouse, or data taken from the data warehouse itself. Because the volume of data in a data mart is less than that in a data warehouse, query processing is often faster.:

Page 9: Data Warehousing

Characteristics of a data mart include

Quicker and simpler implementation. Lower implementation cost. Needs of a specific business unit or function met. Protection of sensitive information stored elsewhere in

the data warehouse. Faster response times due to lower volumes of data. Distribution of data marts to user organizations. Built from the bottom upward.

Departmental or regional divisions often determine whether data marts or data warehouses are used. For example, if managers in different sales regions require data from only their region, then it can be beneficial to build data marts containing specific regional data. If regional managers require access to all the organization’s data, then a larger data warehouse is usually necessary. Although data marts are often designed to contain data relating to a specific business function, there can be times when users need a broader level of business data. However, because this broader-level data is often only needed in summarized form, it is acceptable to store it within each data mart rather than implementing a full data warehouse.

Building a Data Warehouse from Data MartsData warehouses can be built using a top-down or bottom-up approach. Top-down describes the process of building a data

Page 10: Data Warehousing

warehouse for the entire organization, containing data from multiple, heterogeneous, operational sources. The bottom-up approach describes the process of building data marts for departments, or specific business areas, and then joining them to provide the data for the entire organization. Building a data warehouse from the bottom-up, by implementing data marts, is often simpler because it is less ambitious.A common approach to using data marts and data warehouses involves storing all detail data within the data warehouse, and summarized versions within data marts. Each data mart contains summarized data per functional split within the business, such as sales region or product group, further reducing the data volume per data mart.

Data Mart Considerations

Data marts can be useful additions or alternatives to the data warehouse, but issues to consider before implementation include:

Additional hardware and software. Time required to populate each data mart regularly. Consistency with other data marts and the data

warehouse. Network access (if each data mart is located in a

different geographical region).

Designing and Processing Aggregations

OLAP tools are typically used to create and manage summary data. OLAP Services allows aggregations to be stored in a variety of formats and locations, with dynamic connections to underlying details in the data warehouse. Summary data is often generated to satisfy the commonly executed queries in the data warehouse. Storing preaggregated data increases query performance, and reduces the load on the data warehouse.If a data warehouse is built so the data in it does not change, then preaggregating data in the fact table saves only the disk space required by the fact table. OLAP Services uses the processing time that would have been used to preaggregate in the fact table when it processes the fact table as it builds a cube. However, precalculated aggregations are stored in the cube and do not need to be recalculated for each query. If a hybrid OLAP (HOLAP) or relational OLAP (ROLAP) cube is used, the fact table is not copied into the cube as it is in multidimensional OLAP (MOLAP) cubes, so the overhead required to retain availability of the detail data is only the fact table size, not processing time or query response time.Preaggregation strategy when designing a data warehouse for use by OLAP Services depends on the following variables:

Stability of the data.

Page 11: Data Warehousing

If the source data changes, the preaggregations have to be performed each time, whether preaggregated in the fact table or in the OLAP cubes that have to be rebuilt from the fact table.

Query response time. With properly designed OLAP cubes, the granularity of detail in the fact table has no effect on query response time for queries that do not access detail facts.

Storage requirements. A finer level of granularity in the fact table requires more storage for the fact table and for MOLAP cubes. This is a trade-off against detail availability and choice of OLAP cube storage mode. OLAP cubes tend to be large regardless of the storage type; therefore the storage required to retain fine granularity in the fact table may not be particularly significant when compared to OLAP storage needs.

When designing the data warehouse for OLAP, the user's needs should drive the preaggregation strategy. The fact table should only be preaggregated to the level of granularity below which no user would want to access detail.

OLAP: - Online Analytical Processing (OLAP) systems, contrary to regular conventional OLTP systems are capable of analyzing online a large number of transactions or large number of data records(ranging from mega bytes to tera bytes).This type of data is usually multi dimensionality is the key driven for OLAP technology,which happens to be central to data warehousing. Any multidimensional data cant be processed by conventional SQL type DBMS. For complex real world problems the data is usually multi dimensional in nature. SQL will not be capable of handling it effectively even one can manage to put that data in a conventional relational database in normalised table

Integrated OLAP Analytical CapabilitiesOLAP is an increasingly popular technology that can dramatically improve business analysis. Historically, OLAP has been characterized by expensive tools, difficult implementation, and inflexible deployment. OLAP Services is a new, fully featured OLAP capability provided as a component of SQL Server 7.0. OLAP Services includes a middle-tier server that allows users to perform sophisticated analysis on large volumes of data with exceptional results. OLAP Services also includes a client-side cache and calculation engine called Microsoft PivotTable Service, which helps improve performance and reduce network traffic. PivotTable Service allows end users to conduct analyses while disconnected from the network.

Page 12: Data Warehousing

OLAP Services is a middle-tier OLAP server that simplifies user navigation and helps improve performance for queries against information in the data warehouse.

OLAP is a key component of data warehousing, and OLAP Services provides essential functionality for a wide array of applications ranging from reporting to advanced decision support. OLAP functionality within SQL Server 7.0 helps make multidimensional analysis much more affordable and bring the benefits of OLAP to a wider audience, from smaller organizations to groups and individuals within larger corporations. Coupled with the wide variety of tools and applications supporting OLAP applications through Microsoft OLE DB for OLAP, OLAP Services helps increase the number of organizations that have access to sophisticated analytical tools and can help reduce the costs of data warehousing.For more information about Microsoft SQL Server OLAP Services, see SQL Server Books Online.

Data Warehousing and OLAPDTS can function independent of SQL Server and can be used as a stand-alone tool to transfer data from Oracle to any other ODBC or OLE DB-compliant database. Accordingly, DTS can extract data from operational databases for inclusion in a data warehouse or data mart for query and analysis.

In the illustration, the transaction data resides on an IBM DB2 transaction server. A package is created using DTS to transfer and clean the data from the DB2 transaction server and to move it into the data warehouse or data mart. In this example, the relational database server is SQL Server 7.0, and the data warehouse is using OLAP Services to provide analytical capabilities. Client programs (such as Excel) access the OLAP Services server using the OLE DB for OLAP interface, which is exposed through a client-side component called Microsoft PivotTable service. Client programs using PivotTable service can manipulate data in the OLAP server and can even change individual cells.SQL Server OLAP Services is a flexible, scalable OLAP solution, providing high-performance access to information in the data warehouse. OLAP Services supports all implementations of OLAP equally well: multidimensional OLAP (MOLAP), relational OLAP (ROLAP), and a hybrid (HOLAP). OLAP Services addresses the most significant challenges in scalability through partial preaggregation, smart client/server caching, virtual cubes, and partitioning.DTS and OLAP Services offer an attractive and cost-effective solution. Data warehousing and OLAP solutions using DTS and OLAP Services are developed with point-and-click graphical tools that are tightly integrated and easy to use. Furthermore, because the PivotTable service client is using OLE DB, the interface is more open to access by a variety of client applications.

Page 13: Data Warehousing

Data Warehousing ComponentsA data warehouse always consists of a number of components, including:

Operational data sources. Design/development tools. Data extraction and transformation tools. Database management system (DBMS). Data access and analysis tools. System management tools.

Several years ago, Microsoft recognized the need for a set of technologies that would integrate these components. This led to the creation of the Microsoft Data Warehousing Framework, a roadmap not only for the development of Microsoft products such as SQL Server 7.0, but also for the technologies necessary to integrate products from other vendors.

Data Warehouse CharacteristicsA data warehouse can assist decision support and online analytical processing (OLAP) applications because it provides data that is:

Consolidated and consistent. Subject-oriented. Historical.

Consolidated and Consistent Data

A data warehouse consolidates operational data from a variety of sources with consistent naming conventions, measurements, physical attributes, and semantics. For example, in many organizations, applications can often use similar data in different formats: dates can be stored in Julian or Gregorian format; true/false data can be represented as one/zero, on/off, true/false, or positive/negative. Different applications can also use different terms to describe the same type of data. One application can use the term “balance” instead of “total amount” to represent the amount of money in a bank account.Data should be stored in the data warehouse in a single, acceptable format agreed to by business analysts, despite variations in the external operational sources. This allows data from across the organization, such as legacy data on mainframes, data in spreadsheets, or even data from the Internet, to be consolidated in the data warehouse, and effectively cross-referenced, giving the analysts a better understanding of the business.

Subject-oriented Data

Operational data sources across an organization tend to hold a large amount of data about a variety of business-related functions, such as customer records, product information, and

Page 14: Data Warehousing

so on. However, most of this information is also interspersed with data that has no relevance to business or executive reporting, and is organized in a way that makes querying the data awkward. The data warehouse organizes only the key business information from operational sources so that it is available for business analysis.

Historical Data

Data in OLTP systems correctly represents the current value at any moment in time. For example, an order-entry application always shows the current value of stock inventory; it does not show the inventory at some time in the past. Querying the stock inventory a moment later may return a different response. However, data stored in a data warehouse is accurate as of some past point in time because the data stored represents historical information.The data stored in a data warehouse typically represents data over a long period of time; perhaps up to ten years or more. OLTP systems often contain only current data, because maintaining large volumes of data used to represent ten years of information in an OLTP system can affect performance. In effect, the data warehouse stores snapshots of a business’s operational data generated over a long period of time. It is accurate for a specific moment in time and cannot change. This contrasts with an OLTP system where data is always accurate and can be updated when necessary.

Read-only Data

After data has been moved to the data warehouse successfully, it typically does not change unless the data was incorrect in the first place. Because the data stored in a data warehouse represents a point in time, it must never be updated. Deletes, inserts, and updates (other than those involved in the data loading process) are not applicable in a

Page 15: Data Warehousing

data warehouse. The only operations that occur in a data warehouse, when it has been set up, are loading and querying data.

Designing and Building a Data Warehouse and OLAP SystemThe steps required to build a data warehouse include:

Determining business, user, and technical requirements. Designing and building the database. Extracting and loading data into the data warehouse. Designing and processing aggregations using OLAP

tools. Querying and maintaining the data warehouse and

OLAP databases.

Data GranularityA significant difference between an OLTP or operational system and a data warehouse is the granularity of the data stored. An operational system typically stores data at the lowest level of granularity: the maximum level of detail. However, because the data warehouse contains data representing a long period in time, simply storing all detail data from an operational system can result in an overworked system that takes too long to query.A data warehouse typically stores data in different levels of granularity or summarization, depending on the data requirements of the business. If an enterprise needs data to assist strategic planning, then only highly summarized data is required. The lower the level of granularity of data required by the enterprise, the higher the number of resources (specifically data storage) required to build the data warehouse. The different levels of summarization in order of increasing granularity are:

Current operational data Historical operational data Aggregated data Metadata

Current and historical operational data are taken, unmodified, directly from operational systems. Historical data is operational level data no longer queried on a regular basis, and is often archived onto secondary storage.Aggregated, or summary, data is a filtered version of the current operational data. The design of the data warehouse affects how the current data is aggregated. Considerations for generating summary data include the period of time used to aggregate the data (for example, weekly, monthly, and so on),

Page 16: Data Warehousing

and the parts of the operational data to be summarized. For example, an organization can choose to aggregate at the part level the quantity of parts sold per sales representative per week.There may be several levels of summary data. It may be necessary to create summary level data based on an aggregated version of existing summary data. This can give an organization an even higher level view of the business. For example, an organization can choose to aggregate summary level data further by generating the quantity of parts sold per month.Metadata does not contain any operational data, but is used to document the way the data warehouse is constructed. Metadata can describe the structure of the data warehouse, source of the data, rules used to summarize the data at each level, and any transformations of the data from the operational systems.

Building the OLAP Data ModelA fundamental challenge in OLAP implementation is mapping the initial database schema to the multidimensional model. This requires a significant programming effort with many of the products on the market today. In the evolution of OLAP products, OLAP database design has become a specialized and arcane process, intricately linked to the specific OLAP technology being deployed. Consequently, OLAP database developers are specialized, which has led to high costs in developing applications, concentrated at the data design stage. In most OLAP implementations, it is assumed that the data has been prepared for analysis through data warehousing, whereby information has been extracted from operational systems, cleansed, validated, and summarized prior to incorporation into an OLAP application. This is a vital step in the process, which ensures that the data being viewed by the OLAP user is correct, consistent, and matches organizational definitions for the data. Increasingly, information in a data warehouse is organized in star (or snowflake) schemas, which simplify user understanding of the data, maximize performance for decision support applications, and require less storage for large databases.The following illustration is an example of a star schema. In this database schema, a central fact table is linked to related dimension tables.

A star (snowflake) schema is a relational approximation of the OLAP data model and can be an excellent starting point for building OLAP cube definitions. Few OLAP products, however, have taken advantage of this trend. Generally, they have not provided easy tools to map a star schema to an OLAP model, and as a result keep the cost of building OLAP models extremely high and the development time unnecessarily long.

Page 17: Data Warehousing

Intuitive User InterfacesOne of the key differences in Microsoft SQL Server OLAP Services version 7.0 is the OLAP Manager user interface, which has been created with the infrequent OLAP database administrator in mind. The OLAP Manager is delivered as a snap-in to the Microsoft Management Console (MMC), and it shares the same administrative user interface as the entire Microsoft BackOffice family of products. The obvious benefit is that the OLAP database administrator is better able to translate skills from SQL Server and other Microsoft products.More value becomes apparent when the power and flexibility of MMC are understood. OLAP Services includes a full range of taskpads that guide the novice or infrequent user through common tasks. OLAP Services also includes a full tutorial on OLAP concepts and a step-by-step guide to building an OLAP cube. A full complement of wizards is available for automating the most common activities, such as creating a dimension definition. Furthermore, OLAP Services is optimized for developing data warehouses in which star or snowflake schemas have been designed. The Cube Wizard is especially suited to these prebuilt schemas, and translation to the multidimensional model is extremely rapid. OLAP Services can easily accommodate other source schemas should they be encountered.To ensure successful interpretation of the OLAP Services user interface concepts, Microsoft conducted usability tests. Finally, large-scale beta testing has provided broad exposure and customer input to OLAP Services. As a result of the energy spent on the database administrator requirements, most users are able to build their first cube in less than 30 minutes.

PROCESS FLOW WITHIN A DATA WAREHOUSE

The processes are:

1.Extract and load data.2.Clean and transform data into a form that can cope with large data volumes.3.Back up and archive data.4.Manage queries and direct them to appropriate data source.

Page 18: Data Warehousing

Data Warehousing and Online Analytical Processing

A data warehouse is often used as the basis for a decision-support system (also referred to from an analytical perspective as a business intelligence system). It is designed to overcome some of the problems encountered when an organization attempts to perform strategic analysis using the same database that is used to perform online transaction processing (OLTP).

A typical OLTP system is characterized by having large numbers of concurrent users actively adding and modifying data. The database represents the state of a particular business function at a specific point in time, such as an airline reservation system. However, the large volume of data maintained in many OLTP systems can overwhelm an organization. As databases grow larger with more complex data, response time can deteriorate quickly due to competition for available resources. A typical OLTP system has many users adding new data to the database while fewer users generate reports from the database. As the volume of data increases, reports take longer to generate.

As organizations collect increasing volumes of data by using OLTP database systems, the need to analyze data becomes more acute. Typically, OLTP systems are designed specifically to manage transaction processing and minimize disk storage requirements by a series of related, normalized tables. However, when users need to analyze their data, a myriad of problems often prohibits the data from being used:

Users may not understand the complex relationships among the tables, and therefore cannot generate ad hoc queries. Application databases may be segmented across multiple servers, making it difficult for users to find the tables in the first place. Security restrictions may prevent users from accessing the detail data they need. Database administrators prohibit ad hoc querying of OLTP systems, to prevent analytical users from running queries that

Page 19: Data Warehousing

could slow down the performance of mission-critical production databases. By copying an OLTP system to a reporting server on a regularly scheduled basis, an organization can improve response time for reports and queries. Yet a schema optimized for OLTP is often not flexible enough for decision support applications, largely due to the volume of data involved and the complexity of normalized relational tables.

For example, each regional sales manager in a company may wish to produce a monthly summary of the sales per region. Because the reporting server contains data at the same level of detail as the OLTP system, the entire month’s data is summarized each time the report is generated. The result is longer-running queries that lower user satisfaction.

Additionally, many organizations store data in multiple heterogeneous database systems. Reporting is more difficult because data is not only stored in different places, but in different formats.

Data warehousing and online analytical processing (OLAP) provide solutions to these problems. Data warehousing is an approach to storing data in which heterogeneous data sources (typically from multiple OLTP databases) are migrated to a separate homogenous data store. Data warehouses provide these benefits to analytical users:

Data is organized to facilitate analytical queries rather than transaction processing.

Differences among data structures across multiple heterogeneous databases can be resolved.

Data transformation rules can be applied to validate and consolidate data when data is moved from the OLTP database into the data warehouse.

Security and performance issues can be resolved without requiring changes in the production systems.

Sometimes organizations maintain smaller, more topic-oriented data stores called data marts. In contrast to a data warehouse which typically encapsulates all of an enterprise’s analytical data, a data mart is typically a subset of the enterprise data targeted at a smaller set of users or business functions.

Whereas a data warehouse or data mart are the data stores for analytical data, OLAP is the technology that enables client applications to efficiently access the data. OLAP provides these benefits to analytical users:

Page 20: Data Warehousing

Pre-aggregation of frequently queried data, enabling a very fast response time to ad hoc queries.

An intuitive multidimensional data model that makes it easy to select, navigate, and explore the data.

A powerful tool for creating new views of data based upon a rich array of ad hoc calculation functions.

Technology to manage security, client/server query management and data caching, and facilities to optimize system performance based upon user needs.

The terms data warehousing and OLAP are sometimes used interchangeably. However, it is important to understand their differences because each represents a unique set of technologies, administrative issues, and user implications.

CONLUSION: So, Data Warehousing with OLAP technology is useful for handling large amounts of data .It is also useful for analyzing past data for forecasting , and also for graphical analysis and handling multidimensional data .This is useful for fast analysis rather than conventional databases with OLTP .

.

Page 21: Data Warehousing

References:

Data warehousing in Real World --DENNIS MURRAY --SAM ANAHORY Data warehousing with Microsoft SQL server --JAKE STURM Data warehousing with oracle

--Sima yazdani --Shirley s.wons

www.olapreport.com

Page 22: Data Warehousing

Top Related