1 reviewing data warehouse basics. lessons 1.reviewing data warehouse basics 2.defining the business...
TRANSCRIPT
1Reviewing Data Warehouse Basics
Lessons
1. Reviewing Data Warehouse Basics
2. Defining the Business and Logical Models
3. Creating the Dimensional Model
4. Creating the Physical Model
5. Storage Considerations for the Physical Model
6. Strategies for Extracting, Transforming, and Loading
7. Summary Management
8. Analytical Capabilities
Definition of a Data Warehouse
“A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management’s decisions.”
- Bill Inmon
“A system that extracts cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making.”
- Ralph Kimball
Basic Elements of the Data Warehouse
• Source: Source database or other source form
• Data staging area: Intermediate area
• Target: Presentation server for the new data warehouse or data mart
Source Target Data staging
area
Diagram of a Data Warehouse System
Basic Form of the Data Warehouse
Star schema (Dimensional model)
Customer Location
Sales
Supplier Product
Data Warehouse and OLTP Database Design Differences
Unlike an OLTP database design, a warehouse database design must:
• Focus on queries
• Allow incremental development
• Be a nonvolatile structure
• Provide historical data
Data Warehouse Features
A data warehouse:
• Is a repository for information
• Improves access to integrated data
• Ensures integrity and quality
• Provides an historical perspective
• Records results
• Is used by a broad spectrum of end users for a variety of purposes
• Reduces the reporting and analysis impact on operational systems
• Requires a major systems integration effort
Exploring Data Warehouse Characteristics
• Subject-oriented
• Integrated
• Nonvolatile
• Time-variant
Subject-Oriented
Data is categorized and stored by business subjectrather than by application.
OLTP applications
Customerfinancial
information
Data warehouse subject
Equityplans
Shares
Insurance
Loans
Savings
Integrated
Data on a given subject is integrated.
Savings
Currentaccount
Loans
Customer
Nonvolatile
Warehouse
ReadInsert
UpdateDelete
Load
Operational
Read
Time-Variant
Data warehouse
JanuaryTime Data
01/01 January
02/01 February
03/01 March
Load from Many Sources
Nonrelational
systems
Relational databases
External data
External formats
Archive data
Internal data
Decision Support System (DSS)
Profile of DSS Queries
Storage Analytic
DSS
ODS DW OLAP DM
DDS
Data Warehousing ProcessExtraction
RDBMSETL
Federated Data Warehouse
Transformation/Load
Tran
sform
ation
s
Publish
Data marts
DDS
DDS
SubscribeP
ortal
Access layer(s)
Metadata Repository
Flat files
Operational
External
Server logfiles
NDS
ETLStag
ing
area(s)
Comparing Warehouses and Data Marts
Datawarehouse
Datamart
Versus
Property Data Warehouse Data MartScope Enterprise DepartmentSubjects Multiple Single, LOBData source Many FewImplementation time Months to years Months
Flow of Data
StoreFeed
Operationaldata
Externaldata
Access
Relationaltools
Applications
OLAPtoolsMetadata
Summarydata
Raw data
Dependent Data Mart Model
Data mart
Data mart
Systems
Legacy
Operational
Internal
External
Enterprise
ODS Data mart
Independent Data Mart Model
Enterprise
ODSSystems
Legacy
Operational
Internal
ExternalData mart
Data mart
Data mart
Data mart
Data mart
Data mart
Data Warehousing Today
• Business Intelligence
– To help business users understand their business better
– To help them make better operational, tactical, and strategic business decisions
– To help them improve business performance
Data Warehousing Today
• Customer Relationship Management
– Consists of applications that support CRM activities
– Single customer view
– Campaign segmentation
– Customer analysis
– Personalization
– Customer loyalty scheme
Data Warehousing Today
• Data Mining– Known as Knowledge Discovery
– Trying to find meaningful and useful information from a large amount of data
– Interactive or automated process to find patterns describing the data and to predict the future behavior of the data based on these patterns
• Usage– Analyzing the shopping data
– Finding out the pattern between crime and location
– Customer scoring in CRM in terms of loyalty
– Credit Scoring in the credit card industry
Data Warehousing Today
• Master Data Management (MDM)
– Consolidates the master data and processes the data through predefined data quality rules.
– Any changes on master data in OLTP are sent to MDM
– Publishes data to other systems
• Customer Data Integration
– Is a MDM for customer data
– The process of retrieving, cleaning, storing, maintaining and distributing customer data
Future Trends in Data Warehousing
• Unstructured Data
– Documents, images, audio, video, e-mails
• Search
– Search engine
• Service-Oriented Architecture (SOA)
• Real-Time Data Warehouse