dimensional modeling chapter 2. the dimensional data model an alternative to the normalized data...

Post on 31-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dimensional Modeling

Chapter 2

The Dimensional Data Model

An alternative to the normalized data model

Present information as simply as possible (easier to understand)

Return queries as quickly as possible (efficient for queries)

Track the underlying business processes (process focused)

The Dimensional Data Model

Contains the same information as the normalized model

Has far fewer tables Grouped in coherent business categories Pre-joins hierarchies and lookup tables

resulting in fewer join paths and fewer intermediate tables

Normalized fact table with denormalized dimension tables.

GB Video E-R Diagram

Customer#Cust NoF NameL NameAds1Ads2CityStateZipTel NoCC NoExpire

Rental#Rental NoDateClerk NoPay TypeCC NoExpireCC Approval

Line#Line NoDue DateReturn DateOD chargePay type

Requestor of

Owner of

Video

#Video NoOne-day feeExtra daysWeekend

Title

#Title NoNameVendor NoCost

Name for

Holder of

CustomerCustIDCust NoF NameL Name

RentalRentalIDRental NoClerk NoStorePay Type

LineLineIDOD ChargeOneDayChargeExtraDaysChargeWeekendChargeDaysReservedDaysOverdueCustIDAddressIDRentalIdVideoIDTitleIDRentalDateIDDueDateIDReturnDateID

VideoVideoIDVideo No

TitleTitleIDTitleNoNameCostVendor Name

Rental DateRentalDateIDSQLDateDayWeekQuarterHoliday

Due DateDueDateIDSQLDateDayWeekQuarterHoliday

Return DateReturnDateIDSQLDateDayWeekQuarterHoliday

AddressAddressIDAdddress1Address2CityStateZipAreaCodePhone

GB Video Data Mart

Fact Table

Measurements associated with a specific business process

Grain: level of detail of the table Process events produce fact records Facts (attributes) are usually

• Numeric• Additive

Derived facts included Foreign (surrogate) keys refer to dimension tables

(entities) Classification values help define subsets

Dimension Tables

Entities describing the objects of the process Conformed dimensions cross processes Attributes are descriptive

• Text• Numeric

Surrogate keys Less volatile than facts (1:m with the fact table) Null entries Date dimensions Produce “by” questions

Bus Architecture

An architecture that permits aggregating data across multiple marts

Conformed dimensions and attributes Drill Down vs. Drill Across Bus matrix

Keys and Surrogate Keys

A surrogate key is a unique identifier for data warehouse records that replaces source primary keys (business/natural keys)

Protect against changes in source systems Allow integration from multiple sources Enable rows that do not exist in source data Track changes over time (e.g. new customer

instances when addresses change) Replace text keys with integers for efficiency

Slowly Changing Dimensions

Attributes in a dimension that change more slowly than the fact granularity

Type 1: Current only Type 2: All history Type 3: Most recent few (rare)

Note: rapidly changing dimensions usually indicate the presence of a business process that should be tracked as a separate dimension or as a fact table

CustKey BKCustID CustName CommDist Gender HomOwn?

1552 31421 Jane Rider 3 F N

Date CustKey ProdKey Item Count Amount

1/7/2004 1552 95 1 1,798.00

3/2/2004 1552 37 1 27.95

5/7/2005 1552 87 2 320.26

2/21/2006 1552 2387 42 1 19.95

Cust

Key

BKCust

ID

Cust

Name

Comm

Dist

Gender Hom

Own?

Eff End

1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006

2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999

Fact Table

Dimension with a slowly changing attribute

Date Dimensions

One row for every day for which you expect to have data for the fact table (perhaps generated in a spreadsheet and imported)

Usually use a meaningful integer surrogate key (such as yyyymmdd 20060926 for Sep. 26, 2006). Note: this order sorts correctly.

Include rows for missing or future dates to be added later.

Degenerate Dimensions

Dimensions without attributes. (Such as a transaction number or order number.)

Put the attribute value into the fact table even though it is not an additive fact.

Snowflaking (Outrigger Dimensions or Reference Dimensions)

Connects entities to dimension tables rather than the fact table

Complicates coding and requires additional processing for retrievals

Makes type 2 slowly changing dimensions harder to maintain

Useful for seldom used lookups

M:N Multivalued Dimensions

Fact to Dimension Dimension to Dimension

Try to avoid these. Solutions can be very misleading.

Multivalued Dimensions

ORDERS (FACT)SalesRepKeyProductKeySalesRepGrpKeyCustomerKeyOrderQty

SALESREPSalesRepKeyNameAddress

SALESREP-ORDER-BRIDGESalesRepKeySalesrepGroupKeyWeight= (1/NumReps)

Hierarchies

Group data within dimensions: SalesRep• Region

• State

• County

• Neighborhood

Problem structures• Variable depth

• Frequently changing

Heterogeneous Products

Several different kinds of entry with different attributes for each• (The sub-class problem)

Aggregate Dimensions

Dimensions that represent data at different levels of granularity• Remove a dimension

• Roll up the hierarchy (provide a new shrunken dimension with new surr-key that represents rolled up data)

Junk Dimensions

Miscellaneous attributes that don’t belong to another entity, usually representing processing levels • Flags

• Categories

• Types

Fact Tables

Transaction• Track processes at discrete points in time

when they occur Periodic snapshot

• Cumulative performance over specific time intervals

Accumulating snapshot• Constantly updated over time. May include

multiple dates representing stages.

Aggregates

Precalculated summary tables • Improve performance

• Record data an coarser granularity

top related