data warehousing concepts2
TRANSCRIPT
-
8/4/2019 DATA Warehousing Concepts2
1/22
4/14/2012 1
The Goals of a Data Warehouse
The data warehouse is a place where people can
access their data.
The goals of a data warehouse are as follows
Access warehouse retrievals must be fast
The Data in a data warehouse is consistentUsers must be able to slice and dice the data
A warehouse must use easy to use browsing tools
The data warehouse is a place where we publish used
dataThe Quality of the Data in the data warehouse is adriver of business reengineering.
-
8/4/2019 DATA Warehousing Concepts2
2/22
4/14/2012 2
Two Different Worlds
On-line transaction processing (OLTP)is
profoundly different from Dimensional datawarehousing (DDW)
The Users, data content, data structures, thehardware, the software, the administration, themanagement and the daily rhythms are different.
OLTP design techniques and methods areinappropriate for and even destructive forinformation warehousing.
-
8/4/2019 DATA Warehousing Concepts2
3/22
4/14/2012 3
Consistency
Both OLTP and data warehouse systems are
greatly concerned with data consistency.
OLTP consistency is microscopic. The point oftransaction processing is to process a very large
number of tiny, atomic transactions with outloosing any of them.
In a data warehouse, consistency is measured
globally. We dont care about individualtransactions. But we care enormously that thecurrent load of new data is a full and consistentset of data.
-
8/4/2019 DATA Warehousing Concepts2
4/22
4/14/2012 4
What is a Transaction
A serious OLTP System processes thousands or even
millions of transactions per day.
A serious data warehouse often will process only onetransaction per day. But this transaction contains millionsof records Called a Production Data Load.
What we care about is the consistent state of the systemwe started before the production data load.
If we are forced to stop the production data load before itwas complete we will not roll back the inserted records.We will rather overwrite the entire system with a snapshotof the system taken before the production data load.
-
8/4/2019 DATA Warehousing Concepts2
5/22
4/14/2012 5
Users and Managers
The Users of the OLTP System turn the wheels of
an organization where as The Users of a Datawarehouse watch the wheels of the organization
Users of an OLTP system almost always deal with
one account at a time
OLTP users perform the same tasks many , manytimes.
Performance is the absolute king of an OLTPsystem. NO optional activity is allowed to slowdown an OLTP System.
-
8/4/2019 DATA Warehousing Concepts2
6/22
4/14/2012 6
Dimensions in Data Analysis
In the world of data warehousing, a summarizable
numerical value that you use to monitor your businessis called a Measure
When looking for numeric information your firstquestion will be What measure U want to see?
You could look at lets say, ales units, sales dollars,defects etc.
Suppose that U ask to see a report of your companysUnits Sold.
Heres what u get:
113
-
8/4/2019 DATA Warehousing Concepts2
7/224/14/2012 7
Fact Table
A Fact Table is a table in the relational data
warehouse that stores the detailed values formeasures, or facts.
Example a fact table that stores Dollars and Units
by state, by product and by Month has fivecolumns.
The first 3 columns are Key columns, theremaining two are measure values.
State Product Month Units Dollars
-
8/4/2019 DATA Warehousing Concepts2
8/224/14/2012 8
Fact Table
Each column in the fact table should be either a key or
a measure.
The fact table must contain a column for each measure.
The fact table must contain rows at the lowest level ofdetail you might want to retrieve for a measure.
A fact table almost always uses an integer key for eachmember rather than a descriptive name.
The key column for a date dimension might be either aninteger key or a date.
-
8/4/2019 DATA Warehousing Concepts2
9/224/14/2012 CHRIS 9
Dimension Tables
A dimension table contains one row for each leaf
level member of the dimension.Ex. A product dimension table with 3 products willhave 3 rows.
In most cases a dimension table also contains onecolumn containing a numeric key columns thatuniquely identifies each member.
This column that contains the unique value is theprimary key and references the foreign key in thefact table.
-
8/4/2019 DATA Warehousing Concepts2
10/224/14/2012 CHRIS 10
Dimension Tables
If the dimension is involved in a balanced hierarchy it
will have an additional column that gives the parentfor each member.Ex.if you have 3 products in a dimension table thatbelong to a particular product Subcategory your table
will look like this.
PROD_ID Prod_Name SubCategory
589
592
1218
Sweet Muffins
Coconut Muffins
Salt Bread
Muffins
Muffins
Bread
-
8/4/2019 DATA Warehousing Concepts2
11/224/14/2012 CHRIS 11
Star Schema
When each dimension is stored in a single table,
the databases organization is called a starSchema Design.
When a Database Dimensions are stored in a
chain of tables, the databases design is called aSnowflake Design.
A relational database must perform time
consuming joins each time a report executes, anda star design for a dimension requires fewer joinsthan a snowflake design.
-
8/4/2019 DATA Warehousing Concepts2
12/224/14/2012 CHRIS 12
Basic Elements - Data Warehouse
Source System- An operational system of record
whose function it is to capture the transactions ofthe business
Data Staging Area- A Storage area and set of
processes that clean, transform, combine, de-duplicate, household, archive and prepare sourcedata for use in the data warehouse.
Presentation Server - The target physicalmachine on which the data warehouse data isorganized and stored for direct querying by endusers, report writers, and other applications.
-
8/4/2019 DATA Warehousing Concepts2
13/224/14/2012 CHRIS 13
Basic Elements - Data Warehouse
Dimensional Model A specific discipline for
modeling data that is an alternative to entityrelationship (E/R) modeling.
Business Process A coherent set of business
activities that make sense to the business users ofour data warehouses
Data Mart A logical subset of the complete
data warehouse.
Data Warehouse - The queryable source ofdata in the enterprise.
-
8/4/2019 DATA Warehousing Concepts2
14/224/14/2012 CHRIS 14
Basic Elements - Data Warehouse
Operational Data Store(ODS) Has
taken too many definitions to be useful tothe data warehouse.
OLAP (On-line Analytic Processing)
The general activity of querying andpresenting text and number data from datawarehouses, as well as a specifically
dimensional style of querying andpresenting that is exemplified by a numberof OLAP vendors
-
8/4/2019 DATA Warehousing Concepts2
15/224/14/2012 CHRIS 15
Basic Elements - Data Warehouse
ROLAP ( Relational OLAP ) A storage option
or set of user interfaces and applications that givea relational database a dimensional flavor.
MOLAP ( Multidimensional OLAP) -A storageoption or set of user interfaces and applicationsand proprietary database technology that have astrongly dimensional flavor.
HOLAP ( Hybrid OLAP) -A storage option ofboth relational and proprietary structure.
-
8/4/2019 DATA Warehousing Concepts2
16/224/14/2012 CHRIS 16
Basic Elements - Data Warehouse
End User Application - A collection of tools
that query, analyze, and present informationtargeted to support a business need.
End User Data Access Tool - A client of thedata warehouse.
Ad Hoc Query Tool A specific kind of end user
data access tool that invites the user to form theirown queries by directly manipulating relationaltables and their joins.
-
8/4/2019 DATA Warehousing Concepts2
17/224/14/2012 CHRIS 17
Basic Elements - Data Warehouse
Modeling Applications A sophisticated kind of
data warehouse client with analytic capabilitiesthat transform or digest the out put from the datawarehouse.Modeling applications include :
Forecasting modelsBehavior scoring models
Allocation models
Data mining tools
Metadata All the information in the datawarehouse environment that is not the actualdata itself.
-
8/4/2019 DATA Warehousing Concepts2
18/22
4/14/2012 CHRIS 18
Basic Processes - Data Warehouse
Extracting The first step of getting Data into
the data warehouse.
Transformation Once data extracted into thedata staging area, many possible transformation
steps, including Cleaning the data, correctingmisspelling, purging selected fields, CreatingSurrogate keys for each dimension, Building
Aggregates etc.
Loading and Indexing Loading in the datawarehouse.
-
8/4/2019 DATA Warehousing Concepts2
19/22
4/14/2012 CHRIS 19
Basic Processes - Data Warehouse
Quality Assurance Checking Quality
assurance can be checked by running acomprehensive exception report over the entirenew set of newly loaded data.
Release/Publishing - The User communitymust be notified that the new data is ready.
Updating Modern data marts may well be
updated, sometimes frequently. Changes inlabels, changes in hierarchies, changes in status,and changes in corporate ownership.
-
8/4/2019 DATA Warehousing Concepts2
20/22
4/14/2012 CHRIS 20
Basic Processes - Data Warehouse
Querying Querying is abroad term that
encompasses all the activities of requesting datafrom a data mart.
Data Feedback/Feeding in Reverse The
data can also flow in the opposite direction uphill from the traditional flow we have discussed.
Auditing At times it is critically important to
know where the data came from and what werethe calculations performed. For this you cancreate special audit records.
-
8/4/2019 DATA Warehousing Concepts2
21/22
4/14/2012 21
Basic Processes - Data Warehouse
Securing - Every data warehouse has an
exquisite dilemma: Publishing the data as widelyto as many users as possible with the easiest ofuser interfaces, at the same time protect the datafrom misuse and snoopers.
Backing Up and Recovering Since datawarehouse data is a flow of data from the legacysystem on through to the data marts and
eventually onto the users desktops, a realquestion arises about where to take the necessarysnapshots.
-
8/4/2019 DATA Warehousing Concepts2
22/22
4/14/2012 22
Steps in the Design Process
Choose a business process to model
Choose the grain of the business process
Choose the dimensions that will apply foreach business process and theattributes/members for each dimension
Choose the measured facts that willpopulate each fact table record.