data warehousing tyler helmle. introduction definition history processes and terms management...
TRANSCRIPT
Definition
Collection of Data Subject Oriented Integrated Time-Variant Non-Volatile
To Support Managements Decision Making Process
Definition
Bill Inmon 1990
Single Subject Data Mart
Possible to be Volatile Size Contraints Rolling Lengths
Definition
Ralph Kimball “a copy of transaction data specifically structured
for query and analysis.” Less insight Less depth Not Less Accurate
History
Concept Originated (Late 1980’s) IBM Researchers Barry Delvin and Paul Murphy “Business Data Warehouse” Operational Systems to Decision Support
History
Trends Corporations had multiple Decision Support
Systems Expensive Redundant
Get to one efficient warehouse Data Marts
Design
Define the architecture, do capacity planning, and select the storage servers, database and OLAP servers, and tools
Integrate the servers, storage, and client tools
Design the warehouse schema and views Define the physical warehouse organization,
data placement, partitioning, and access methods
Design (Cont.)
Connect the sources using gateways, ODBC drivers, or other wrappers
Design and implement scripts for data extraction, cleaning, transformation, load, and refresh
Populate the repository with the schema and view definitions, scripts, and other metadata
Data Extracting
Extract from Source Systems Database Flat Files
Converts into format for transformation
Data Transforming
Selection of Columns Translation of Data
Make all data consistent Derive New Calculated Value Filtering Sorting Joining of Data
Data Transforming Tools
Data Migration Transformation of Data
Data Scrubbing Domain Specific Knowledge Fixing or Eliminating Data
Data Auditing Patterns Inconsistencies
Data Load
Batch Processing Visibility and Management Administrator
Start, Cancel, Suspend, Resume Load Large Quick and Efficient
Data Load
Parallel Processing Splitting Data Files to provide Parallel Access Pipeline
Simultaneous running of several components Component
Simultaneous running of multiple processes
All three types usually operate at same time
Data Mart
Subset of Data Major Data Subject Increase Performance Separate Security Prove ROI before adding to Data Warehouse
Data Mart
Cons Limited Scalability Duplication of Data Data Inconsistency Low ability to leverage enterprise sources of data
Metadata
“Data about Data” Clarifies Data Explains Data Helps Navigate Through Data Facilitates (Both Human and Computers)
Understanding Usage Management
Metadata
Different from Data Sometimes can be both Point of view
Types Descriptive Administrative Structural Technical Use
Metadata
Content Describe Resource Describe Content
Mutability Immutable Mutable
Logical Function Sub-symbolic Symbolic Logical
Data Mining
Requires Large Amount of Data Determine Patterns in Data Determine Relationships in Data Analyze Results and Conclude
Standard Reporting
Weekly/Monthly/Yearly Same Constant Data Recent Information Arrives at Expected Time on Expected
Medium
Configurable Reports
Select Parameters to Show on Report Ex.
Time Period Region
Support of Data Remains the Same
Ad Hoc Reporting
End Users Pick and Choose Data Requires
Training Knowledge of Data
Do Not Want Users Miss Using the Data
Data Warehouse Management
Data Warehouse Administrator Visibility Knowledge of what is happening
Loading, Extracting, Etc.
Data Warehouse Management
Responsibilities What is data being used for? Who is using the data? What is the response time? What kinds of activities are being submitted?
Data Warehouse Management
Data Management (Data Monitor) Counts of Data Analyzes Profiles of Data Determines Threshold of Data
Makes DWA’s job easier
Pros
One Common Data Source Ease of Reporting and Analysis
Inconsistencies Identified Right Away Retrieve Data Without Slowing Down
Operational Systems Show Actual Performance
Summary
Data Warehousing can prove to be a success Common Knowledge of a database
technology As technology and research continues, the
possibilities are endless