dimensional_modeling[1]
TRANSCRIPT
![Page 1: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/1.jpg)
1
Dimensional Dimensional DesignDesign
Dr. Debashis Parida
Presented by
![Page 2: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/2.jpg)
2
Course AgendaCourse Agenda
Rationale for dimensional modeling Dimensional modeling basics Dimensional modeling details Fact table details Dimension table details Design process Aggregate schemas Multiple fact tables Architected data marts
![Page 3: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/3.jpg)
3
Rationale for Rationale for Dimensional ModelingDimensional Modeling
![Page 4: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/4.jpg)
4
OLTP Design CharacteristicsOLTP Design Characteristics
Focus of OLTP Design
Individual data elements
Data relationships
Design goals Accurately model
business Remove redundancy
![Page 5: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/5.jpg)
5
OLTP Design ShortcomingsOLTP Design Shortcomings
Complex Unfamiliar to
business people Incomplete history Slow query
performance
![Page 6: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/6.jpg)
6
Emergence of Dimensional Emergence of Dimensional ModelModel Logical modeling technique
For designing relational database structures Addresses OLTP design shortcomings
For use in analytic systems First developed early 1980's
Packaged goods industry Popularized by Ralph Kimball, PhD.
1996 book: 'The Data Warehouse Toolkit'
![Page 7: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/7.jpg)
7
Dimensional Modeling Dimensional Modeling BasicsBasics
![Page 8: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/8.jpg)
8
Brand
Captain Coffee
Product
Standard Coffee Maker
Thermal Coffee Maker
Deluxe Coffee Maker
All Products
Units Sold
5,000
2,400
2,073
9,473
Units Shipped
3,800
1,632
1,658
7,090
% Shipped
76%
68%
80%
75%
Coffee Maker Fulfillment Report
FactsFacts
Process MeasurementProcess Measurement
Measures Metrics or indicators
by which people evaluate a business process
Referred to as “Facts” Examples
Margin Inventory Amount Sales Dollars Receivable Dollars Return Rate
![Page 9: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/9.jpg)
9
Perspective FocusPerspective Focus
Process-oriented business perspectives
categoryProduct, warehous
e
G/L account supplier
OperationsSales and Marketing
Customer Services
Product Developme
nt
![Page 10: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/10.jpg)
10
Brand
Captain Coffee
Product
Standard Coffee Maker
Thermal Coffee Maker
Deluxe Coffee Maker
All Products
Units Sold
5,000
2,400
2,073
9,473
Units Shipped
3,800
1,632
1,658
7,090
% Shipped
76%
68%
80%
75%
Coffee Maker Fulfillment Report
DimensionsDimensions
Process PerspectivesProcess Perspectives
Dimensions The parameters by which
measures are viewed Used to break out, filter
or roll up measures Often found after the
word “by” in a business question
Descriptive business terms
Examples Product Warehouse Customer Supplier
![Page 11: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/11.jpg)
11
Dimensional ModelDimensional Model
Definition Logical data model used to represent the
measures and dimensions that pertain to one or more business subject areas
Dimensional Model = Star Schema Serves as basis for the design of a
relational database schema Can easily translate into multi-
dimensional database design if required Overcomes OLTP design shortcomings
![Page 12: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/12.jpg)
12
Dimensional Model Dimensional Model AdvantagesAdvantages
Understandable Systematically
represents history
Reliable join paths
High performance
query
Enterprise scalability
![Page 13: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/13.jpg)
13
StoreStore
Star SchemaStar Schema
TimeTime
ProductProduct
FactsFacts
Schema SimplicitySchema Simplicity
Fewer tables Denormalized Consolidated
Dimensional Familiar to users Facts go in the fact
tables Dimensions in
dimension tables
Increases understandability
![Page 14: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/14.jpg)
14
Time Dimension
year
quarter
month
date
day of the week
holiday flag
ord_date
Data FamiliarityData Familiarity
Adding business context
Single source field Expanded into parts Decoded into business
terms Add special indicators
and flags e.g. time dimension
Increases understandability
![Page 15: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/15.jpg)
15
Store
Product
Facts
Time DimensionTime Dimension
Time Dimension
year
quarter
month
date
day of the week
holiday flag
Representing HistoryRepresenting History
Time dimension Part of every star
schema
Marks the date when
the facts (process
measurements)
occurred
Allows the schema to
easily add and query
data over time Especially useful for
performing comparison queries
![Page 16: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/16.jpg)
16
Fewer Join PathsFewer Join Paths
Star schema joins Defined during schema
design - not runtime
Business people can
easily understand
these relationships
One-to-many relations
between dimensions
and facts
Referential integrity
always enforced
![Page 17: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/17.jpg)
17
High Performance DesignHigh Performance Design
Fewer joins means less 'expensive' queries
Deterministic query patterns
Star schema query optimization supported by all major RDBMS vendors
![Page 18: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/18.jpg)
18
Subject area dimensional
models
Subject Area ModelsSubject Area Models
Manufacturing and Process
Control
Sales Order Entry and Campaign
Management
Customer Support and Relationship Management
Shipping and Inventory
Management
Subject area E/R models
OperationsSales and Marketing
Customer Services
Product Developme
nt
![Page 19: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/19.jpg)
19
Enterprise ModelsEnterprise Models
Enterprise Scope E/R model
Enterprise scope dimensional model
![Page 20: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/20.jpg)
20
Dimensional Design Dimensional Design DetailsDetails
![Page 21: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/21.jpg)
21
Dimension
Dimension
Dimension
Star Schema Dimension Star Schema Dimension TablesTables Dimension tables
Store dimension values
Textual content Dimension tables
usually referred to simply as 'dimensions'
Spend extra effort to add dimensional attributes
![Page 22: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/22.jpg)
22
key
key
key
Dimension
Dimension
Dimension
Dimension KeysDimension Keys
Synthetic keys Each table assigned
a unique primary key, specifically generated for the data warehouse
Primary keys from source systems may be present in the dimension, but are not used as primary keys in the star schema
![Page 23: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/23.jpg)
23
Key
attribute
attribute
attribute
Key
attribute
attribute
attribute
Key
attribute
attribute
attribute
Dimension
Dimension
Dimension
Dimension ColumnsDimension Columns
Dimension attributes Specify the way in
which measures are viewed: rolled up, broken out or summarized
Often follow the word “by” as in “Show me Sales by Region and Quarter”
Frequently referred to as 'Dimensions'
![Page 24: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/24.jpg)
24
Fact Table
fact1
fact2
fact3
Star Schema Fact TableStar Schema Fact Table
Process measures Start by assigning
one fact table per business subject area
Fact tables store the process measures (aka Facts)
Compared to dimension tables, fact tables usually have a very large number of rows
![Page 25: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/25.jpg)
25
Fact Table
fact1
fact2
fact3
keykeykey
Fact Table Primary KeyFact Table Primary Key
Every fact table Multi-part primary
key added Made up of foreign
keys referencing dimensions
![Page 26: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/26.jpg)
26
Fact Table SparsityFact Table Sparsity
Sparsity Term used to describe the very common
situation where a fact table does not contain a row for every combination of every dimension table row for a given time period
Because fact tables contain a very small percentage of all possible combinations, they are said to be "sparsely populated" or "sparse"
![Page 27: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/27.jpg)
27
Fact Table
Fact Table GrainFact Table Grain
Grain The level of detail
represented by a row in the fact table
Must be identified early
Cause of greatest confusion during design process
Example Each row in the fact
table represents the daily item sales total
![Page 28: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/28.jpg)
28
Designing a Star SchemaDesigning a Star Schema
Five initial design steps Based on Kimball's six steps Start designing in order Re-visit and adjust over project life
![Page 29: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/29.jpg)
29
1.1. Identify fact table
Start by naming the fact table with the name of the business subject area
Step OneStep One
![Page 30: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/30.jpg)
30
StepStep TwoTwo
2.2. Identify fact table grain
Describe what a row in the fact table represents - in business terms
![Page 31: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/31.jpg)
31
StepStep ThreeThree
3.3. Identify dimensions
![Page 32: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/32.jpg)
32
StepStep FourFour
4.4. Select facts
![Page 33: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/33.jpg)
33
StepStep FiveFive
5.5. Identify dimensional attributes
![Page 34: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/34.jpg)
34
Fact Table DetailsFact Table Details
![Page 35: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/35.jpg)
35
Example Fact TableExample Fact Table
Sales Factsmodel_key
dealer_key
time_key
revenue
quantity
![Page 36: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/36.jpg)
36
FactsFacts
Fully additive Can be summed across any and all
dimensions Stored in fact table Examples: revenue, quantity
![Page 37: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/37.jpg)
37
FactsFacts
Semi-additive Can be summed across most dimensions
but not all Anything that measures a “level” Must be careful with ad-hoc reporting Often aggregated across the “forbidden
dimension” by averaging
![Page 38: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/38.jpg)
38
FactsFacts
Non-Additive Cannot be summed across any dimension
All ratios are non-additive
Break down to fully additive components,
store them in fact table
![Page 39: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/39.jpg)
39
Factless Fact TableFactless Fact Table
A fact table with no measures in it Nothing to measure... …Except the convergence of
dimensional attributes Sometimes store a “1” for convenience Examples: Attendance, Customer
Assignments, Coverage
![Page 40: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/40.jpg)
40
Dimension TableDimension TableDetails
![Page 41: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/41.jpg)
41
Example Dimension TablesExample Dimension Tables
dealer_key
regionstatecitydealer
model_key
brandcategorylinemodel
Model time_key
yearquartermonthdate
Time
Dealer
![Page 42: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/42.jpg)
42
Dimension TablesDimension Tables
Characteristics Hold the dimensional attributes
Usually have a large number of attributes
(“wide”) Add flags and indicators that make it easy
to perform specific types of reports Have small number of rows in comparison
to fact tables (most of the time)
![Page 43: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/43.jpg)
43
Don’t Normalize DimensionsDon’t Normalize Dimensions
Saves very little space Impacts performance Can confuse matters when multiple
hierarchies exist A star schema with normalized
dimensions is called a "snowflake schema"
Usually advocated by software vendors whose product require snowflake for performance
![Page 44: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/44.jpg)
44
Slowly Changing DimensionsSlowly Changing Dimensions
Dimension source data may change
over time Relative to fact tables, dimension
records change slowly Allows dimensions to have multiple
'profiles' over time to maintain history Each profile is a separate record in a
dimension table
![Page 45: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/45.jpg)
45
Slowly Changing Dimension Slowly Changing Dimension ExampleExample Example: A woman gets married
Possible changes to customer dimension• Last Name• Marriage Status• Address• Household Income
Existing facts need to remain associated with her single profile
New facts need to be associated with her married profile
![Page 46: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/46.jpg)
46
Slowly Changing Dimension Slowly Changing Dimension TypesTypes Three types of slowly changing
dimensions Type 1
• Updates existing record with modifications• Does not maintain history
Type 2• Adds new record• Does maintain history• Maintains old record
Type 3: • Keep old and new values in the existing row• Requires a design change
![Page 47: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/47.jpg)
47
Designing Loads to Handle Designing Loads to Handle SCDSCD Design and implementation guidelines
Gather SCD requirements when designing data mapping and loading
SCD needs to be defined and implemented at the dimensional attribute level
Each column in a dimension table needs to be identified as a Type 1 or a Type 2 SCD
If one Type 1 column changes, then all Type 1 columns will be updated
If one Type 2 column changes, then a new record will be inserted into the dimension table
![Page 48: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/48.jpg)
48
Designing Loads to Handle Designing Loads to Handle SCDSCD Design and implementation guidelines
For large dimension tables, change data capture techniques may be used to minimize the data volume
For smaller dimension tables, compare all OLTP records with dimension table records
Balance data volume with change data capture logic complexities
![Page 49: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/49.jpg)
49
Degenerate DimensionsDegenerate Dimensions
Dimensions with no other place to go Stored in the fact table Are not facts Common examples include invoice
numbers or order numbers
![Page 50: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/50.jpg)
50
Dimensional Design Dimensional Design ProcessProcess
Project Context
![Page 51: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/51.jpg)
51
Development Phase
Deployment Phase
Design Phase
Data Mart DevelopmentData Mart Development
Dimensional modeling is a critical part of the data mart development effort
![Page 52: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/52.jpg)
52
Data Mart DevelopmentData Mart Development
Design phase Determine requirements and design schema
Development phase Iterative build and feedback
Deployment phase Automate load, document, train users
![Page 53: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/53.jpg)
53
Project DeliverablesProject Deliverables Design
Project definition document
Project plan Schema design Mapping document Report design
Development Populated data mart Load routines
(Sagent “Plans”) Query and reporting
environment
Deployment Automation Documentation Training materials
![Page 54: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/54.jpg)
54
Development Phase
Deployment Phase
Design Phase
Project ApproachProject Approach
The dimensional model is developed during the design stage
Scope of the project has already been determined
![Page 55: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/55.jpg)
55
Development Phase
Deployment Phase
Design Phase
Design Stage ActivitiesDesign Stage Activities
Gather requirements through requirements workshops
Develop star schema Conduct design review
![Page 56: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/56.jpg)
56
Gather RequirementsGather Requirements
Requirements definition User workshops Spreadsheets Sample reports
Source systems analysis DBA interviews Copybooks E/R diagrams
![Page 57: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/57.jpg)
57
Design DeliverablesDesign Deliverables
Deliverables The star schema itself Load mapping document
How these primary components are delivered will depend on needs and format chosen Modeling tools Spreadsheets Text documents
![Page 58: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/58.jpg)
58
NotationNotation
No recognized standard ER semantics unnecessary Clarity is the only characteristic that
really matters
![Page 59: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/59.jpg)
59
Design Naming StandardsDesign Naming Standards
Responsibility of data administration Extended to the data warehouse Important to start early in the project
Suggested conventions Fact tables Dimension tables Aggregate tables Keys
![Page 60: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/60.jpg)
60
Data Element DefinitionsData Element Definitions
Clear descriptions Facts Calculated formulae Dimensional attributes Multiple meanings/synonymous terms Aliases
![Page 61: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/61.jpg)
61
Data Element InstancesData Element Instances
Example of Data
As it will exist in the warehouse
After decoding
Adds to model understanding
Removes ambiguity/uncertainty
![Page 62: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/62.jpg)
62
Data Element MappingData Element Mapping
Where is the data coming from
Source system
Table
Column
Record
Field
![Page 63: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/63.jpg)
63
Data TransformationData Transformation
Changing the data
Serves as spec for ETL process
Decodes
Type conversion
Conditional logic
Handling of NULL’s
![Page 64: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/64.jpg)
64
Aggregates SchemasAggregates Schemas
![Page 65: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/65.jpg)
65
Aggregate DesignsAggregate Designs
Aggregates Pre-stored fact summaries Along one or more dimensions The most effective tool for improving
performance
Examples Summary of sales by region, by product, by
category Monthly sales
![Page 66: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/66.jpg)
66
Aggregate BackgroundAggregate Background
Aggregate rationale Improve end user query performance Reduce required CPU cycles Powerful cost saving tool
Restrictions Additive facts only Must use dimensional design
![Page 67: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/67.jpg)
67
Aggregate GuidelinesAggregate Guidelines
Don’t start with aggregates
Design and build based on usage Sooner or later you'll need to build
aggregates
![Page 68: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/68.jpg)
68
Aggregate TypesAggregate Types
Level field
Separate fact tables
![Page 69: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/69.jpg)
69
Aggregate TypesAggregate Types
Level field Old technique Requires “level” attribute in appropriate
dimensions Aggregates and base-level facts stored in
same table Same number of total fact records as
separate table approach Drawbacks
Every query must constrain on the level field Possibility of double counting
![Page 70: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/70.jpg)
70
Aggregate TypesAggregate Types
Separate Tables Separate fact table for every aggregate Separate dimension table for every aggregate
dimension Same number of fact records as level field
tables Advantage
Removes possibility of double counting Schema clarity
Caveat Requires software with aggregate navigation
capability
![Page 71: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/71.jpg)
71
Aggregate PitfallsAggregate Pitfalls
Sparsity failure Term used to describe the result of building
too many aggregate fact that do not summarize enough rows.
When Sparsity failure occurs, a relatively small star schema can grow (in terms of disk size) thousands of times.
Sparsity failure = aggregate explosion
![Page 72: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/72.jpg)
72
Aggregate Design GuidelinesAggregate Design Guidelines
Rule of twenty To avoid aggregate explosion Make sure each aggregate record
summarizes 20 or more lower-level records
Remember Total number of possible fact tables in any
given dimensional model = cartesian product of all levels in all the dimensions
![Page 73: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/73.jpg)
73
Year (1)
Quarter (4)
Month (12)
Date (365)
Time
5 years
20 quarters
60 months
1825 days
Hierarchies & Aggregate Hierarchies & Aggregate DesignDesign Hierarchy diagram
Helps visualize options for building aggregates
Adding cardinalities insures following the rule of 20
Not required to build initial star schema
![Page 74: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/74.jpg)
74
Aggregate NavigationAggregate Navigation
Description Function provided by software layer:
Aggregate Navigator Directs user queries to the most favorable
available aggregate
Transparent to the end user
![Page 75: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/75.jpg)
75
Business View
Designer View
Aggregate FrameworkAggregate Framework
![Page 76: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/76.jpg)
76
Aggregate DeploymentAggregate Deployment
Incremental
Based on usage
Transparent to users
Typically warehouse DBA responsibility
![Page 77: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/77.jpg)
77
Build SubjectArea 1No aggregates
Build SubjectArea 2No aggregates
BuildBuildaggregatesaggregatesforforSubject area 1Subject area 1
Build SubjectArea 3No aggregates
BuildBuildaggregatesaggregatesforforSubject area 2Subject area 2
Build SubjectArea 4No aggregates
BuildBuildaggregatesaggregatesforforSubject area 3Subject area 3
Some re-work requiredSome re-work required
Aggregate DeploymentAggregate Deployment
![Page 78: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/78.jpg)
78
Multiple Fact TablesMultiple Fact Tables
![Page 79: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/79.jpg)
79
Multiple Fact TablesMultiple Fact Tables
Different business processes usually require different fact tables
There are also several cases where a single business process will require multiple fact tables Core and custom Snapshot and transaction Coverage Aggregates
![Page 80: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/80.jpg)
80
Different Business ProcessesDifferent Business Processes
Different business processes usually require different fact tables
In practice, it may be hard to identify what a “process” is
Sometimes you can spot different processes because measures are recorded With different dimensions At differing grains
![Page 81: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/81.jpg)
81
Different Dimensions or Different Dimensions or GrainGrain Don’t take shortcuts with grain
The 'not applicable' dimension value Using a 'not applicable' row in a dimension
confuses the grain and can introduce reporting difficulty
![Page 82: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/82.jpg)
82
Different Points in TimeDifferent Points in Time
Sometimes, it is not easy to identify the discrete business processes
All measures may have the same dimensionality or grain
Different measures are recorded at different times Quantity sold is not recorded at the same
time as quantity shipped
![Page 83: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/83.jpg)
83
Different TimingDifferent Timing
Building a single fact table would require recording zero or null for measures that are not applicable at a point in time
Reports would contain a confusing combination of zeros, nulls, and absence of data
![Page 84: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/84.jpg)
84
Identifying Different Identifying Different ProcessesProcesses Look at the measures in question
Sort them into fact tables based on Dimensions
Grain
Differing timings of events measured
![Page 85: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/85.jpg)
85
Design Tools for Multiple Design Tools for Multiple TablesTables Create a set of matrices
Facts vs dimension Facts vs dimensional attributes
Mark where facts apply to dimensions Mark where facts apply to dimensional
attributes When facts don't apply, assume
separate fact table
![Page 86: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/86.jpg)
86
Multiple Fact Table SummaryMultiple Fact Table Summary
Different processes need different tables Identified with
Grain Dimensionality Timing
Same process may need multiple fact tables Heterogeneous attributes Coverage Snapshot and transaction Aggregates
![Page 87: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/87.jpg)
87
Architected Data Architected Data MartsMarts
![Page 88: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/88.jpg)
88
Data MartData Mart
Meaning of the term 'data mart' has shifted over the last several years...
![Page 89: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/89.jpg)
89
Operational Systems
E.T.L.E.T.L.
SoftwareSoftware
Data Warehouse
Analysis Users
Query & Query &
ReportinReportin
g g
SoftwareSoftware
E.T.L.E.T.L.
SoftwareSoftware
Data Marts
Data Mart Architecture 1993Data Mart Architecture 1993
![Page 90: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/90.jpg)
90
Operational Systems
E.T.L.
SoftwareData Marts
Analysis Users
Query & Reporting Software
Data Mart Architecture 1997Data Mart Architecture 1997
![Page 91: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/91.jpg)
91
Operational Systems
Analysis Users
Data Mart
Data Warehouse
Architected Data MartsArchitected Data Marts
E.T.LSoftwar
e
Query & Reporting Software
![Page 92: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/92.jpg)
92
Data MartData Mart
Warehouse Subject Area
Incremental warehouse development
Centralized architecture
Not new
Well - suited to star schemas
![Page 93: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/93.jpg)
93
Store Sales Facts
Product
Time (Day)
Product
Time (Day)
Shipments Facts
Warehouse
Warehouse
Inventory Facts
Product
Month
““Stovepipe” Data MartsStovepipe” Data Marts
“Stovepipe” data marts
Inconsistent and overlapping data
Difficult and costly to maintain
Redundant data load Can’t drill across Integration requires
starting over
Dimensions not conformed
![Page 94: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/94.jpg)
94
Conformed DimensionsConformed Dimensions
Definition Dimensions are conformed when they are
the same -or-
When one dimension is a strict rollup of
another
![Page 95: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/95.jpg)
95
Conformed DimensionsConformed Dimensions
Same dimensions must:
1. ... have exactly the same set of primary keys
and2. ... have the same number of records
![Page 96: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/96.jpg)
96
Conformed DimensionsConformed Dimensions
Rolled up dimension When one dimension is a strict rollup of
another
Which means Two conformed dimensions can be
combined into a single logical dimension by creating a union of the attributes
![Page 97: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/97.jpg)
97
Conformed DimensionsConformed Dimensions
Description Shared common dimensions
Integrates logical design
Ensures consistency between data marts
Allows incremental development
Independent of physical location
Some re-work may be required
![Page 98: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/98.jpg)
98
Conformed DimensionsConformed Dimensions
Advantages Enables an incremental development
approach
Easier and cheaper to maintain
Drastically reduces extraction and loading
complexity
Answers business questions that cross data
marts
Supports both centralized and distributed
architectures
![Page 99: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/99.jpg)
99
Store Dimensio
nSales Facts
Product Dimensio
n
Time Dimensio
nShipment Facts
Warehouse
Dimension
Inventory Facts
Month Dimensio
n
Conformed DimensionsConformed Dimensions
Interlocking Star SchemasInterlocking Star Schemas
![Page 100: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/100.jpg)
100 Store Product Day Warehouse Month
Sales Facts
Shipment Facts
Inventory Facts
Kimball’s Data Warehouse Kimball’s Data Warehouse BusBus
![Page 101: Dimensional_Modeling[1]](https://reader034.vdocument.in/reader034/viewer/2022051513/5472390eb4af9fe52c8b4682/html5/thumbnails/101.jpg)
101
Course ReviewCourse Review
Rationale for dimensional modeling Dimensional modeling basics Dimensional modeling details Fact table details Dimension table details Design process Aggregate schemas Multiple fact tables Architected data marts