unit-ii principles of dimensional modeling dimensional modeling: advanced topics etl olap 1
TRANSCRIPT
UNIT-IIUNIT-II
• Principles of dimensional modeling
• Dimensional modeling:advanced topics
• ETL
• OLAP1
Principles of dimensional modelingPrinciples of dimensional modeling
• From requirements to data design
• STAR schema
• STAR schema keys
• Advantages of the STAR schema
2
From requirementsFrom requirementsto data designto data design
• Requirements gathering
• Requirements definition document (with information packages)
• Data design
• Dimensional model
(figure 10-1)
3
Figure 10-1Figure 10-1
4
Design decisionsDesign decisions
• Choosing the process (Subjects)
• Choosing the grain(Level of Details)
• Identifying and conforming the dimensions
• Choosing the facts
• Choosing the duration of the database
(Duration of historical data)5
Dimensional modeling basicsDimensional modeling basics
• From the information package diagram:– The metrics or facts fact table (figure 10-2)– Dimensions dimension tables with attributes
(figure 10-3)
6
Figure 10-2Figure 10-2
7
Figure 10-3Figure 10-3
8
Dimensional modelingDimensional modeling
• Dimensional model with fact table in the middle and the dimension tables around
• Called a STAR schema (figure 10-4)
9
Figure 10-4Figure 10-4
10
Dimensional Data Modeling Dimensional Data Modeling (DDM)(DDM)
• DDM comprises of one or more dimension tables and fact tables.
• Dimension tables store records related to that particular dimension. E.g. location, Product, Time.
• A fact (measure) table contains measures (sales gross value, total units sold) and dimension columns.
• These dimension columns are actually foreign keys from the respective dimension tables. 11
Example of Dimensional Data Example of Dimensional Data Model: Model:
12
• In the figure, sales fact table is connected to dimensions (location, product, time and organization).
• It shows that data can be sliced across all dimensions and
• It is also possible for the data to be aggregated across multiple dimensions.
13
• ‘Sales Dollar’ in sales fact table can be calculated across all dimensions independently or in a combined manner that is explained below.
– Sales Dollar value for a particular product – Sales Dollar value for a product in a location – Sales Dollar value for a product in a year within a
location – Sales Dollar value for a product in a year within a
location sold or serviced by an employee
14
Uses of DDMUses of DDM
• DDM is used for calculating summarized data. • For example, sales data could be collected on
a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so on.
• The data can then be referred to as aggregate / summarized data.
• The performance of DDM can be significantly increased when materialized views are used.
15
• Materialized view is a pre-computed table comprising aggregated or joined data from fact and possibly dimension tables which also known as a summary or aggregate table.
16
Dimension TableDimension Table
• Dimension table is one that describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products.
• Dimension tables are sometimes called lookup or reference tables.
17
Relational vs DimensionalRelational vs Dimensional
• Relational Data Model (RDM) is used in OLTP systems, which are transaction oriented, and DDM is used in OLAP systems, which are analytical based.
• In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in a DW.
18
• Data is stored in RDBMS
• Tables are units of storage• Data is normalized and used for
OLTP. Optimized for OLTP processing
• Several tables and chains of relationships among them
• Volatile (several updates) • Detailed level of transactional
data
• Normal Reports
• Data is stored in RDBMS or Multidimensional databases
• Cubes are units of storage• Data is denormalized and used
in DW and data mart. Optimized for OLAP
• Few tables and fact tables are connected to dimensional tables
• Non volatile and time variant• Summary of bulky transactional
data (Aggregates and Measures) used in business decisions
• User friendly, interactive, drag and drop multidimensional OLAP Reports
19
DM Versus E-R modeling DM Versus E-R modeling (figure 10-5, 10-6)(figure 10-5, 10-6)
20
The STAR SchemaThe STAR Schema
• Star Schema is a database schema for representing multi-dimensional data.
• It is the simplest form of DW schema that contains one or more dimensions and fact tables.
21
The STAR SchemaThe STAR Schema
• It is called a star schema because the relationship between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions.
• The center of the star schema consists of a large fact table and it points towards the dimension tables.
• Simple STAR schema (figure 10-7)22
Figure 10-7Figure 10-7
23
Steps in designing Star Steps in designing Star Schema Schema
• Identify a business process for analysis (like sales).
• Identify measures or facts.
• Identify dimensions for facts.
• List the columns that describe each dimension.
• Determine the lowest level of summary in a fact table.
24
Characteristics of Characteristics of Dimension TableDimension Table
• Dimension Table Key (PK)
• Table is Wide
• Textual Attributes
• Attributes not directly related
• Not Normalized
• Drilling-down, rolling-up
• Multiple Hierarchies
• Fewer no of records25
Inside a dimension table Inside a dimension table (figure 10-10)(figure 10-10)
26
Characteristics of Characteristics of Fact TableFact Table
• Concatenated key• Data granularity• Measure Types
– Full Additive - Measures that can be added across all dimensions.
– Non-Additive - Measures that cannot be added across all dimensions.
– Semi Additive - Measures that can be added across few dimensions and not with others.
• Table deep, not wide• Sparse data
27
Inside the fact table Inside the fact table (figure 10-11)(figure 10-11)
28
Factless fact table Factless fact table (figure 10-12)(figure 10-12)
29
Data granularityData granularity
• fact table at lowest grain
30
Star schema keysStar schema keys
• Primary key (dimension table)
• Surrogate keys (system-generated sequence keys)– Avoid built-in meanings in keys– Do not use production system keys
• Foreign key in fact table
• Concatenated primary key in fact table
31
Advantages of Advantages of STAR schema STAR schema
STAR schema is a relational model, it is not a normalized model:
• Easy for user to understand
• Optimizes navigation
• Most suitable for query processing
32