wilson data warehouse

34
1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data Warehousing Presented by Joseph M. Wilson EPA

Upload: vandana-kapil-vaswani

Post on 13-Sep-2015

8 views

Category:

Documents


3 download

DESCRIPTION

Datawarehouse

TRANSCRIPT

Data Warehouse Technologies*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
An Introduction to Data Warehousing
Presented by
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
In the Beginning, life was simple…
Data

*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
But…
Data








*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Our information needs…




















*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Kept growing. (The Spider web)
SOURCE: William H. Inmon












































*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Purpose
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Briefing Contents

*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
So What Is a Data Warehouse?
Definition: A data warehouse is the data repository of an enterprise. It is generally used for research and decision support.
By comparison: an OLTP (on-line transaction processor) or operational system is used to deal with the everyday running of one aspect of an enterprise.
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Why Do We Need Data Warehouses?
Consolidation of information resources
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
What Is a Data Warehouse Used for?
Knowledge discovery
Medical research
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Goals
Structure
Size
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Comparison Chart of Database Types
Data warehouse
Operational system
Subject oriented
Transaction oriented
Small (MB up to several GB)
Historic data
Current data
Batch updates
Continuous updates
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Design Differences
Star Schema
Data Warehouse
Operational System
ER Diagram
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Supporting a Complete Solution
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Data Warehouses, Data Marts, and Operational Data Stores
Data Warehouse – The queryable source of data in the enterprise. It is comprised of the union of all of its constituent data marts.
Data Mart – A logical subset of the complete data warehouse. Often viewed as a restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group.
Operational Data Store (ODS) – A point of integration for operational systems that developed independent of each other. Since an ODS supports day to day operations, it needs to be continually updated.
SOURCE: Ralph Kimball
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Briefing Contents

*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Building a Data Warehouse
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Stage 1: Analysis
Dimensional analysis
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Stage 2: Design
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Dimensional Modeling
Fact Table – The primary table in a dimensional model that is meant to contain measurements of the business.
Dimension Table – One of a set of companion tables to a fact table. Most dimension tables contain many textual attributes that are the basis for constraining and grouping within data warehouse queries.
SOURCE: Ralph Kimball
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Stage 3: Import Data
Identify data sources
Extract the needed data from existing systems to a data staging area
Transform and Clean the data
Resolve data type conflicts
Remove, correct, or flag bad data
Conform Dimensions
Analysis
Design
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Importing Data Into the Warehouse
Operational Systems
(source systems)
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Stage 4: Install Front-end Tools
Reporting tools
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Stage 5: Test and Deploy
Usability tests
Software installation
User training
Analysis
Design
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Special Concerns
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Briefing Contents

*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Goals of the STORET Central Warehouse
Improved performance and faster data retrieval
Ability to produce larger reports
Ability to provide more data query options
Streamlined application navigation
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Old Web Application Flow
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Central Warehouse Application Flow
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Web Application Demo
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
STORET Central Warehouse – Potential Future Enhancements
More query functionality
Additional report types
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Data Warehouse Components
SOURCE: Ralph Kimball
End User Data Access
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Data Warehouse Components – Detailed
Storage: flat file (fastest); RDBMS; other Processing: clean; prune; combine; remove duplicates; household; standardize; conform dimensions; store awaiting replication; archive; export to data marts No user query services
Data Mart #1: OLAP (ROLAP and/or MOLAP) query services; dimensional; subject oriented; locally implemented; user group driven; may store atomic data; may be frequently refreshed; conforms to DW Bus
Data Mart #2
Data Mart #3
End User Applications
feed
feed
feed
feed
End User Data Access
*
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.
Briefing Contents
Data Warehouse Concepts