dimensional model
DESCRIPTION
Dimensional ModelTRANSCRIPT
![Page 1: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/1.jpg)
1 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Data Warehouse Systems
Dimensional ModelGabriel David
![Page 2: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/2.jpg)
2 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Building a data warehouse• Build the whole DW at a time
- Huge task requiring the knowledge of- All the legacy systems- The meaning of all columns- All the management goals
• Build a fraction at a time, independently- Easier but leads to isolated data marts
• Dimensional Bus Architecture
• Step by step method• Global initial design• Connectable data mart at a time implementation
![Page 3: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/3.jpg)
3 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Data Mart No longer: highly aggregated subset of a DW
too large to de queried But now: natural (area, subject, process) and
complete (atomic data) subset of the global DW
Must not be isolated• Non-connectable data marts are the curse of the DW• Worse then loosing an opportunity of a deep
organization analysis• Perpetuates incompatible views of the organization
![Page 4: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/4.jpg)
4 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Method Short initial phase
• Global planning of the overall architecture- Conformed dimensions- Normalized facts
Supervision of data mart building- Only conformed dimensions and normalized facts are used
• Extracting data from operational sources• Transforming data• Loading the data mart
Result• Puzzle which will become an integrated DW
![Page 5: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/5.jpg)
5 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Conformed dimension Means the same in all fact tables
• Across data marts• Ex: client, product, local, time
Well defined user key Managed data (cleaned, consolidated) Consistent interfaces and contents Consistent interpretation of attributes and aggregations Anonymous DW key
• Differente from production key• Avoids key collisions• Allows the creation of new records
Establish a dictionary of conform dimensions• Approved by the manager and by the information manager• Process reengineering is a possibility
![Page 6: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/6.jpg)
6 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Ex1: Accounting operational system
Cost_centernumberdescriptionowner
Classificationaccountcurrent_budget
Personnumbernamecategorydepartment
Recordrefdatecost_centerpersonclassificationamount
Departmentacronymnamebudget
Categorycategory_numbercategory_desc
![Page 7: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/7.jpg)
7 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Ex1: Accounting data warehouse
Cost_centercost_center_idnumberdescriptionowner_idowner_numberowner_name
Classificationclassification_idaccountcurrent_budget
Personperson_idnumbernamecategory_idcategory_numbercategory_descdepartment_idacronymdepartment_namebudget
Recordperson_idclassification_idtime_idcost_center_idrefamount Time
time_iddatedayweek_daymonthmonth_nameyear
fact table
dimension
dimension
dimension
dimension
keys
fact
Category
Department
Personhierarch
y
![Page 8: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/8.jpg)
8 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Ex2: Pedagogical inquiry DWSee PDF for the operational model
![Page 9: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/9.jpg)
9 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Ex2: Pedagogical inquiry Conclusions
• Process analysis- Relevant facts: the student answers to each question on
each subject/lecturer- Relevant dimensions: inquiry, question (dimension, scope),
subject (year, program), lecturer (department), quiz • Several tables discarded
- Visual quiz configuration- Non-relevant attributes on lecturers and subject
occurrence- Dereferencing of answer values- Data filtering to keep just the relevant lines
![Page 10: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/10.jpg)
10 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
ER vs dimensional model Entity relationship Operational systems Transaction oriented
• Recording single facts Highly normalized
• Consistently updatable• Tables represent entities
or associations; user keys Non-queriable
• Many tables; arbitrary links
Dimensional model Decision support
systems Analysis oriented
Pre-computed aggregations Denormalized
• No update after load• Tables represent numeric
facts and complex dimensions; anonymous keys
Systematic efficient query
• Star schema, star join
![Page 11: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/11.jpg)
11 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Equivalent? One ER may correspond to many stars
• There is no loss of information (include as many stars as necessary)
Simplicity in entities and complexity in relations is traded by complexity in dimensions and simplicity in the star schema
• However, it is common to discard certain operational details
![Page 12: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/12.jpg)
12 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Fact tables in 4 steps Method to design a dimensional model
1. The data mart 2. Fact table granularity 3. The dimensions 4. The facts
![Page 13: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/13.jpg)
13 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
1. The data mart A data mart is a subset of a DW
• It is not a mini-DW which, together with other isolated mini-DWs, “by chance” makes up an integrated DW
To choose a data mart is to choose a data source
• Single source: orders, shipings, payments• Multiple source: client revenue (profits + costs)
One should start with a single source• The idea is to reduce the data cleaning and consolidation
tasks• In the context of the conformed dimensions• Data marts are combined in a second phase
![Page 14: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/14.jpg)
14 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
2. Fact table granularity Clearly define the meaning of a fact record Rule: granularity should be as fine as
possible• Not to loose information• Get a more robust design
- Wrt future non-anticipated queries- Wrt the addition of new data elements
• To choose month as the granularity of a data mart for product selling in a store, implies it is not possible to accurately analyze the impact of a 15 days long promotion
![Page 15: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/15.jpg)
15 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
A fact record is … Each sales transaction Each compensation asked to the insurance
company Each ATM transaction Each daily total product sales Each monthly account balance Each order line Each delivery note line Each risk covered by an individual insurance policy
![Page 16: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/16.jpg)
16 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Granularity levels Individual transactions (first three)
• Atomic facts, simple structure• Arbitrary number (possibly zero)• The measure is a single amount
Summary, balance, snapshot (next two)• Wait for the end of the period (day, month, …)• Several measures: total sales, number of transactions
(additive), final balance (semi-additive)• On the daily case, the snapshot may coincide with an
aggregation (this is redundant, by performance reasons)• In the monthly balance there may be information
meaningful just for the whole month and thus not dispensable
![Page 17: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/17.jpg)
17 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Granularity levels (cont.) Control document items (last three)
• A fact table record following the whole life of an item• Several temporal keys for the several item phases• “Status” dimension tracking the evolution• Due to the duration of the represented processes,
these records are more subject to change than other kinds of facts
![Page 18: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/18.jpg)
18 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
3. Dimensions Choice determined by the choice of granularity Usually there is a minimum set of dimensions
for a fact table to be understood• Ex. Order item: order date, client, product, and order
number (degenerate dimension) Many other dimensions may be added
• Each extra dimension gets just one value in the primary dimensions context
• Do not affect granularity• Ex. Shipping date, terms of contract, promotions,
meteorology
![Page 19: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/19.jpg)
19 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Characteristics of dimensions Best dimension to associate to a set of measures: the one
with coarser granularity which still gets a single value• For daily facts, choose the day as dimension and not the year,
which could have many values• Do not choose the hour, which would be too fine, repeating the
value Multi-valued dimensions
• Possible, but complicate questions and reports• Require the definition of a way to make them additive, weighing
each hypothesis (ex. Several possible diagnostics for a single treatment)
New dimension just adds a key in the fact table; applications untouched
• Ex: add a weather status dimension
![Page 20: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/20.jpg)
20 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Granularity of dimensions Dimension granularity cannot be finer than
fact granularity• If facts are monthly, the time dimension cannot be the
day• May be coarser, with no contradiction• Ex. use the ‘brand’ for the dimension product, instead
of the specific reference- Loose information but without logical incoherence
![Page 21: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/21.jpg)
21 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
4. The facts The facts must be specific of the chosen granularity,
which determines their scope Individual transaction tables
• One fact (one column, besides keys), the amount Snapshot tables
• Several facts, several measures, extendable to new summaries Item tracking tables
• Several facts (ex. quantities, gross and liquid amounts) Do not mix aggregate facts or facts with other
granularities• Aggregations are kept in separate records and tables• Avoid misleading analysis tools
![Page 22: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/22.jpg)
22 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Completing the selection Fact table: set of simultaneous measures at a
certain granularity• Numeric measures are more useful but may be textual
Define the measures, sometimes imposed by the operational system, and the respective dimensions
Add all the available dimensions• Specially if they get a single value for the measure
context• Do not take the “user needs” as the starting point• Instead, study the “reality” of the organization (physical
perspective) to become less dependent on subjectivity
![Page 23: Dimensional Model](https://reader033.vdocument.in/reader033/viewer/2022051705/577cc7d01a28aba711a1c2cf/html5/thumbnails/23.jpg)
23 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David
Evolution of the dimensions Situation: a person may change name but
does not change ID card number (user key) Three answers
• Type 1 – change the attribute name in the dimension- History is lost- Error correction
• Type 2 – creates a new record in the dimension with a new anonymous key
- “Uniqueness” of natural key is lost- Detailed tracking of evolution (start and end dates)
• Type 3 – creates an old name attribute in the dimension and keeps the last value
- Limited history- Partition on time is fuzzy