Download - Dw design 1_dim_facts
DATA WAREHOUSING Multi Dimensional Data Modeling. Facts and Dimensions
2
While an entity-relationship modeling approach from relational database design could be used, the dimensional modeling approach to logical design is more often used for a data warehouse.
3
End users cannot understand, remember, navigate an E/R model (not even with a GUI)
One reason is that an enterprise-level ERM would be too complex to understand.
4
Software cannot usefully query an E/R model
5
Use of E/R modeling doesn’t meet the DW purpose: intuitive and high performance querying
6
7
Fact Table Dimension Table
Time_Dim TimeKey
TheDate . . .
Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey
$ . . .
Employee_Dim EmployeeKey
EmployeeID . . .
Product_Dim ProductKey
ProductID . . .
Customer_Dim CustomerKey
CustomerID . . .
Shipper_Dim ShipperKey ShipperID . . .
8
Geographic Product Time Units $
Dimension
Tables
Geographic
Product
Time
Fact Table Measures
Facts
Dimension
Several distinct dimensions, combined with
facts, enable you to answer business
questions.
They are normally textual and descriptive descriptions of the business.
9
Dimensions
dimension tables contain relatively small amounts of relatively static data
10
Dimensions
dimension table: usually not-normalized
11
Dimensions
Independent of each other, not hierarchically related
12
Dimensions
Dimensional attributes (attributes no key) help to describe the dimensional value.
13
Dimensional attributes
Fact are (usually numerical) measures of business.
14
Facts
Fact table is the largest table in the star schema and is composed of large volumes of data
15
Facts
Fact table is (often) normalized
16
Facts
fact table has a composite primary key made up of foreign keys
17
Facts
PK = FKi
fact table usually contains one or more numerical facts that occur for the combination of keys that define each record
18
Facts
measures
A fact table contains either detail-level facts or facts that have been aggregated (summary tables)
19
Facts
Σ
Facts are:
additive
semi-additive
non-additive
20
Facts
Non-additive facts cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along some of
the dimensions and not along others:
current_Balance is a semi-additive fact as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?) but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information
The most useful measures are: Numeric, Additive
21
Facts
Atomic level of data of the business process
A definition of the highest level of detail that is supported in a data warehouse
22
A fact table usually contains facts with the same level of aggregation
a proper dimensional design allows only facts of a uniform grain (the same dimensionality) to coexist in a single fact table
23
Some perfectly good fact tables represent measurements that have no facts! This kind of measurements is often called an event. The classic example of such a factless fact table is a record representing a student attending a class on a specific day. The dimensions are Day, Student, Professor, Course, and Location, but there are no obvious numeric facts. The tuition paid and grade received are good facts but not at the grain of the daily attendance.
24
Dimensions without attributes. (Such as a transaction number or order number.)
Put the attribute value into the fact table even though it is not an additive fact.
25
26
27
Employee_Dim EmployeeKey
EmployeeID . . .
EmployeeKey
Time_Dim TimeKey
TheDate . . .
TimeKey
Product_Dim ProductKey
ProductID . . .
ProductKey
Customer_Dim CustomerKey
CustomerID . . .
CustomerKey
Shipper_Dim ShipperKey
ShipperID . . .
ShipperKey
Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey $ . . .
TimeKey
CustomerKey ShipperKey
ProductKey EmployeeKey
Multipart Key
Measures
Dimensional Keys
Fact table provides statistics
for sales broken down by
product, time, employee, shipper
and customer, dimensions
28
1. Choosing the data mart for the small group of end users we deal with.
Choose a business process to model, e.g., orders, invoices, etc.
29
2. Fact table granularity (the smallest defined level of data in the table) is determined.
30
3. Fact table dimensions are selected.
Choose the dimensions that will apply to each fact table record
Add dimensions for "everything you know" about this grain.
31
4. Determine the facts for the table. In most cases, the granularity is at the transaction level, so the fact is the amount.
Choose the measure that will populate each fact table record
Add numeric measured facts true to the grain
32
The Data Warehouse Toolkit.Second Edition.The Complete Guide to Dimensional Modeling.Ralph Kimball.Margy Ross