dimensional model

23
1 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David Data Warehouse Systems Dimensional Model Gabriel David [email protected]

Upload: oluwatobiadewale

Post on 22-Jul-2016

10 views

Category:

Documents


1 download

DESCRIPTION

Dimensional Model

TRANSCRIPT

Page 1: Dimensional Model

1 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Data Warehouse Systems

Dimensional ModelGabriel David

[email protected]

Page 2: Dimensional Model

2 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Building a data warehouse• Build the whole DW at a time

- Huge task requiring the knowledge of- All the legacy systems- The meaning of all columns- All the management goals

• Build a fraction at a time, independently- Easier but leads to isolated data marts

• Dimensional Bus Architecture

• Step by step method• Global initial design• Connectable data mart at a time implementation

Page 3: Dimensional Model

3 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Data Mart No longer: highly aggregated subset of a DW

too large to de queried But now: natural (area, subject, process) and

complete (atomic data) subset of the global DW

Must not be isolated• Non-connectable data marts are the curse of the DW• Worse then loosing an opportunity of a deep

organization analysis• Perpetuates incompatible views of the organization

Page 4: Dimensional Model

4 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Method Short initial phase

• Global planning of the overall architecture- Conformed dimensions- Normalized facts

Supervision of data mart building- Only conformed dimensions and normalized facts are used

• Extracting data from operational sources• Transforming data• Loading the data mart

Result• Puzzle which will become an integrated DW

Page 5: Dimensional Model

5 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Conformed dimension Means the same in all fact tables

• Across data marts• Ex: client, product, local, time

Well defined user key Managed data (cleaned, consolidated) Consistent interfaces and contents Consistent interpretation of attributes and aggregations Anonymous DW key

• Differente from production key• Avoids key collisions• Allows the creation of new records

Establish a dictionary of conform dimensions• Approved by the manager and by the information manager• Process reengineering is a possibility

Page 6: Dimensional Model

6 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Ex1: Accounting operational system

Cost_centernumberdescriptionowner

Classificationaccountcurrent_budget

Personnumbernamecategorydepartment

Recordrefdatecost_centerpersonclassificationamount

Departmentacronymnamebudget

Categorycategory_numbercategory_desc

Page 7: Dimensional Model

7 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Ex1: Accounting data warehouse

Cost_centercost_center_idnumberdescriptionowner_idowner_numberowner_name

Classificationclassification_idaccountcurrent_budget

Personperson_idnumbernamecategory_idcategory_numbercategory_descdepartment_idacronymdepartment_namebudget

Recordperson_idclassification_idtime_idcost_center_idrefamount Time

time_iddatedayweek_daymonthmonth_nameyear

fact table

dimension

dimension

dimension

dimension

keys

fact

Category

Department

Personhierarch

y

Page 8: Dimensional Model

8 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Ex2: Pedagogical inquiry DWSee PDF for the operational model

Page 9: Dimensional Model

9 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Ex2: Pedagogical inquiry Conclusions

• Process analysis- Relevant facts: the student answers to each question on

each subject/lecturer- Relevant dimensions: inquiry, question (dimension, scope),

subject (year, program), lecturer (department), quiz • Several tables discarded

- Visual quiz configuration- Non-relevant attributes on lecturers and subject

occurrence- Dereferencing of answer values- Data filtering to keep just the relevant lines

Page 10: Dimensional Model

10 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

ER vs dimensional model Entity relationship Operational systems Transaction oriented

• Recording single facts Highly normalized

• Consistently updatable• Tables represent entities

or associations; user keys Non-queriable

• Many tables; arbitrary links

Dimensional model Decision support

systems Analysis oriented

Pre-computed aggregations Denormalized

• No update after load• Tables represent numeric

facts and complex dimensions; anonymous keys

Systematic efficient query

• Star schema, star join

Page 11: Dimensional Model

11 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Equivalent? One ER may correspond to many stars

• There is no loss of information (include as many stars as necessary)

Simplicity in entities and complexity in relations is traded by complexity in dimensions and simplicity in the star schema

• However, it is common to discard certain operational details

Page 12: Dimensional Model

12 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Fact tables in 4 steps Method to design a dimensional model

1. The data mart 2. Fact table granularity 3. The dimensions 4. The facts

Page 13: Dimensional Model

13 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

1. The data mart A data mart is a subset of a DW

• It is not a mini-DW which, together with other isolated mini-DWs, “by chance” makes up an integrated DW

To choose a data mart is to choose a data source

• Single source: orders, shipings, payments• Multiple source: client revenue (profits + costs)

One should start with a single source• The idea is to reduce the data cleaning and consolidation

tasks• In the context of the conformed dimensions• Data marts are combined in a second phase

Page 14: Dimensional Model

14 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

2. Fact table granularity Clearly define the meaning of a fact record Rule: granularity should be as fine as

possible• Not to loose information• Get a more robust design

- Wrt future non-anticipated queries- Wrt the addition of new data elements

• To choose month as the granularity of a data mart for product selling in a store, implies it is not possible to accurately analyze the impact of a 15 days long promotion

Page 15: Dimensional Model

15 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

A fact record is … Each sales transaction Each compensation asked to the insurance

company Each ATM transaction Each daily total product sales Each monthly account balance Each order line Each delivery note line Each risk covered by an individual insurance policy

Page 16: Dimensional Model

16 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Granularity levels Individual transactions (first three)

• Atomic facts, simple structure• Arbitrary number (possibly zero)• The measure is a single amount

Summary, balance, snapshot (next two)• Wait for the end of the period (day, month, …)• Several measures: total sales, number of transactions

(additive), final balance (semi-additive)• On the daily case, the snapshot may coincide with an

aggregation (this is redundant, by performance reasons)• In the monthly balance there may be information

meaningful just for the whole month and thus not dispensable

Page 17: Dimensional Model

17 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Granularity levels (cont.) Control document items (last three)

• A fact table record following the whole life of an item• Several temporal keys for the several item phases• “Status” dimension tracking the evolution• Due to the duration of the represented processes,

these records are more subject to change than other kinds of facts

Page 18: Dimensional Model

18 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

3. Dimensions Choice determined by the choice of granularity Usually there is a minimum set of dimensions

for a fact table to be understood• Ex. Order item: order date, client, product, and order

number (degenerate dimension) Many other dimensions may be added

• Each extra dimension gets just one value in the primary dimensions context

• Do not affect granularity• Ex. Shipping date, terms of contract, promotions,

meteorology

Page 19: Dimensional Model

19 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Characteristics of dimensions Best dimension to associate to a set of measures: the one

with coarser granularity which still gets a single value• For daily facts, choose the day as dimension and not the year,

which could have many values• Do not choose the hour, which would be too fine, repeating the

value Multi-valued dimensions

• Possible, but complicate questions and reports• Require the definition of a way to make them additive, weighing

each hypothesis (ex. Several possible diagnostics for a single treatment)

New dimension just adds a key in the fact table; applications untouched

• Ex: add a weather status dimension

Page 20: Dimensional Model

20 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Granularity of dimensions Dimension granularity cannot be finer than

fact granularity• If facts are monthly, the time dimension cannot be the

day• May be coarser, with no contradiction• Ex. use the ‘brand’ for the dimension product, instead

of the specific reference- Loose information but without logical incoherence

Page 21: Dimensional Model

21 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

4. The facts The facts must be specific of the chosen granularity,

which determines their scope Individual transaction tables

• One fact (one column, besides keys), the amount Snapshot tables

• Several facts, several measures, extendable to new summaries Item tracking tables

• Several facts (ex. quantities, gross and liquid amounts) Do not mix aggregate facts or facts with other

granularities• Aggregations are kept in separate records and tables• Avoid misleading analysis tools

Page 22: Dimensional Model

22 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Completing the selection Fact table: set of simultaneous measures at a

certain granularity• Numeric measures are more useful but may be textual

Define the measures, sometimes imposed by the operational system, and the respective dimensions

Add all the available dimensions• Specially if they get a single value for the measure

context• Do not take the “user needs” as the starting point• Instead, study the “reality” of the organization (physical

perspective) to become less dependent on subjectivity

Page 23: Dimensional Model

23 MAP-I / DWS / Orlando Belo, Maribel Santos, Gabriel David

Evolution of the dimensions Situation: a person may change name but

does not change ID card number (user key) Three answers

• Type 1 – change the attribute name in the dimension- History is lost- Error correction

• Type 2 – creates a new record in the dimension with a new anonymous key

- “Uniqueness” of natural key is lost- Detailed tracking of evolution (start and end dates)

• Type 3 – creates an old name attribute in the dimension and keeps the last value

- Limited history- Partition on time is fuzzy