![Page 1: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/1.jpg)
DATA WAREHOUSING
![Page 2: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/2.jpg)
Introduction Modern organizations have huge amounts of data
but are starving for information – facing information gap!
Reasons for information gap: Fragmented information systems, uncoordinated
(sometimes inconsistent) databases /islands of information due to time and resources constraints
Most systems support operational /transaction processing rather than informational processing.
Bridging the information gap are data warehouses!
![Page 3: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/3.jpg)
Introduction (Contd.)
A data warehouse bridges this information gap as it consolidates and integrates information from
internal and external sources and arranges it in a meaningful format for making accurate
business decisions
The two noticeable pioneers in the DWH field are Ralph Kimball and Bill Inmon.
The term Data Warehouse was coined by Bill Inmon in 1990
![Page 4: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/4.jpg)
What is a Data Warehouse? Inmon defines a data warehouse as follows:
"a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process".
Description of the above terms according to Inmon Subject Oriented:
Data that gives information about a particular subject instead of about a company's ongoing operations.
Integrated: Data that is gathered into the data warehouse from a variety of
sources and merged into a coherent whole.
![Page 5: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/5.jpg)
Time-variant: All data in the data warehouse is identified with a
particular time dimension. The time factor can be used to study trends and changes.
Non-volatile Data is stable in a data warehouse. More data is added
but data is never removed. This enables management to gain a consistent picture of the business.
![Page 6: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/6.jpg)
What is a Data Warehouse?
This definition by Inmon remains reasonably accurate almost ten years later. However, a single-subject data warehouse is typically referred to as
a data mart, while data warehouses are generally enterprise in scope.
data warehouses can be volatile. Due to the large amount of storage required for a data warehouse,
(multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse.
For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be "rolled off" the database, and the newest month added.
![Page 7: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/7.jpg)
What is a Data Warehouse?
Ralph Kimball provided a much simpler definition of a data warehouse. a data warehouse is "a copy of transaction data
specifically structured for query and analysis".
This definition provides less insight and depth than Mr. Inmon's, but is no less accurate.
![Page 8: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/8.jpg)
The Need for Data warehousing
Two factors drive the need for data warehousing
Need of company-wide view of data
Need to separate operational* and informational* systems
* See slide notes
![Page 9: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/9.jpg)
Data warehousing?
It is the process of creating, populating, and then querying a data warehouse
Components of data warehousing are Source Systems Identification Data warehouse Design and Creation Data Acquisition Changed data capture Data Cleansing Data Aggregation Business Intelligence
Multidimensional Analysis Tools Query Tools Data Mining Tools Data Visualization Tools
Metadata Management
![Page 10: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/10.jpg)
Components of Data warehousing Source System Identification
Appropriate data must be located to build a data warehouse. This data comes from
Current OLTP systems (providing day-to-day information) `Legacy systems (providing historical data for prior periods)
Data Warehouse Design and Creation The design is often an iterative process Requires effort to understand database schema and a great deal
of user interaction
![Page 11: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/11.jpg)
Components of Data warehousing Data Acquisition
Involves moving company data from source systems into warehouse
Time consuming activity Performed using ETL (Extract/Transform/ Load) Tools Data acquisition is an ongoing, scheduled process.
Warehouse is refreshed monthly Changed data capture
Periodic update of warehouse from transactional systems is complicated as it is difficult to identify which records in source have changed since last update.
Some technologies used in this area are Replication servers, Publish/Subscribe, Triggers and Stored procedures, and Database Log analysis.
![Page 12: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/12.jpg)
Components of Data warehousing Data Cleansing
Typically performed in conjunction with data acquisition (part of “T” in “ETL”)
A complicated process that validates and if required corrects data before its loaded into warehouse
Example The entries for "Customer Name" may appear differently in
various source systems for the same customer. one entered as "IBM", one as "I.B.M.", and one as
"International Business Machines". A decision must be made as to which is correct, and then the
data cleansing tool will change the others to match the rule. This process is also referred to as "data scrubbing" or "data
quality assurance". It can be an extremely complex process, especially if some of the warehouse inputs are from older mainframe file systems (commonly referred to as "flat files"
or "sequential files").
![Page 13: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/13.jpg)
Components of Data warehousing
Data Aggregation If required, it is performed during “T” phase of “ETL” Data warehouse can be designed to store data at detail level (each
individual transaction), at some aggregate level (summary data) or a combination of both.
The advantage of summarized data is that typical queries against the warehouse run faster.
The disadvantage is that information, which may be needed to answer a query, is lost during aggregation
Business Intelligence (BI) Contains technologies such as Decision Support Systems (DSS),
Executive Information Systems (EIS), On-Line Analytical Processing (OLAP), Relational OLAP (ROLAP), Multi-Dimensional OLAP (MOLAP), Hybrid OLAP (HOLAP, a combination of MOLAP and ROLAP), and more.
![Page 14: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/14.jpg)
Components of Data warehousing
BI can be broken down into four broad fields Multidimensional Analysis Tools
Allow user to look at data from various different angles Often use a multidimensional database called cube
Query Tools Allow user to issue SQL queries against warehouse and get results
Data Mining Tools Automatically search for patterns in data Driven by complex statistical formulas
Data Visualization Tools Graphically represent data including 3D data pictures
![Page 15: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/15.jpg)
Components of Data warehousing
Metadata management Throughout the entire process of identifying,
acquiring, and querying the data, metadata management takes place
The datatype (e.g., string or integer) of the column is metadata. The name of the column is another. The actual value in the column for a particular row is not metadata - it is data
Metadata is stored in metadata repository Metadata is useful in almost all components of data
warehousing discussed earlier
![Page 16: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/16.jpg)
Data Mining
Definition: Knowledge discovery using a sophisticated blend of techniques
from traditional statistics, artificial intelligence and computer graphics
Goals To explain observed events or conditions, such as why sales of
a product have increased in a particular area To confirm hypothesis, such as, whether two-income families are
more likely to buy family medical cover than single-income families
To analyze data for new or unexpected relationships, such as what spending patterns are likely to accompany credit card fraud.
![Page 17: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/17.jpg)
Data Mining Techniques Case-based reasoning
Derives rules from real world case examples Rule discovery
Searches for patterns and correlations in large data sets Signal Processing
Identifies clusters of observations with similar characteristics
Neural nets Develops predictive models based on principles
modeled after the human brain Fractals
Compresses large databases without losing information
![Page 18: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/18.jpg)
Some Data Mining Applications
Analysis of business trends Identifying markets with above average or below average growth
Target marketing Identifying customers for promotional activity
Usage Analysis Identifying usage patterns of products and services
Product Affinity Identifying products that are purchased concurrently or
characteristics of shoppers For further application types - refer to book page 437
table 11.5
![Page 19: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/19.jpg)
Reading Assignment
Read these concepts from Book The ETL process Definitions
@ctive data warehouse Data mart
Independent data mart dependent data mart EDW
Star Schema Snowflake Schema OLAP MOLAP ROLAP
![Page 20: DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information](https://reader035.vdocument.in/reader035/viewer/2022072013/56649e535503460f94b48a9d/html5/thumbnails/20.jpg)
Resources
http://www.intranetjournal.com/features/datawarehousing.html
Modern Database Management, 6/e