presentation # 506

71
1 Presentation # 506 David Stanford David Stanford President President Red Sky Data Inc. Red Sky Data Inc. [email protected] [email protected] Design Tips for the Warehouse Design Tips for the Warehouse Architect Architect IOUG Live! 2005

Upload: keala

Post on 06-Feb-2016

49 views

Category:

Documents


0 download

DESCRIPTION

IOUG Live! 2005. Design Tips for the Warehouse Architect. Presentation # 506. David Stanford President Red Sky Data Inc. [email protected]. Objectives. Obtain a clear understanding of data warehouse design ‘hot points’ Identify solutions and alternatives for these ‘hot points’ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presentation # 506

1

Presentation # 506David StanfordDavid Stanford

PresidentPresidentRed Sky Data Inc.Red Sky Data Inc.

[email protected]@redskydata.com

Design Tips for the Warehouse Design Tips for the Warehouse ArchitectArchitect

IOUG Live! 2005

Page 2: Presentation # 506

2

Objectives

Obtain a clear understanding of data warehouse design ‘hot points’Identify solutions and alternatives for these ‘hot points’See how real world solutions are implemented

Page 3: Presentation # 506

3

Agenda

Top 10 Gotchya’sDesign Traps– Loading Dirty Fact Data– Surrogate Keys– The Staging Area

Slowly Changing DimensionsTracking All HistoryAudit ConsiderationsBad & Missing DataAdministrative FieldsOther Tidbits of Advice

Page 4: Presentation # 506

4

Dave’s Top 10 Gotchya’s

1. Failing to model for both a) view of the data when the event occurred and b) view of the data as of today’s reality

2. Limiting the number of dimensions3. Failing to model and populate a meta data

repository4. Failing to provide sufficient audit capabilities

to verify loads against source systems5. Not using surrogate keys for everything

Page 5: Presentation # 506

5

6. Failing to design an error correction process7. Normalizing too much8. Not using a staging area9. Failing to load ALL of the fact data10.Failing to classify incorrect data11.Making it too complex!

Dave’s Top 10 Gotchya’s

Page 6: Presentation # 506

6

Design Traps

Design ReviewStaging AreaSurrogate KeysFacts – Surrogates and Dirty Data

Page 7: Presentation # 506

7

Data Warehouse Process

Source OLTPSource OLTPSystemsSystems Data MartsData Marts

•Design•Mapping

•Design•Mapping

•Extract•Scrub•Transform

•Extract•Scrub•Transform

•Load•Index•Aggregation

•Load•Index•Aggregation

•Replication•Data Set Distribution

•Replication•Data Set Distribution

•Access & Analysis•Resource Scheduling & Distribution

•Access & Analysis•Resource Scheduling & Distribution

Meta DataMeta Data

System MonitoringSystem Monitoring

• Raw Detail• No/Minimal History

• Integrated•Scrubbed

• History•Summaries

• Targeted• Specialized (OLAP)

Data Characteristics

DataWarehouse

Source: Enterprise GroupSource: Enterprise Group

StagingStagingAreaArea

Page 8: Presentation # 506

8

Where The Work Is

Source OLTPSource OLTPSystemsSystems Data MartsData Marts

•Design•Mapping

•Design•Mapping

•Extract•Scrub•Transform

•Extract•Scrub•Transform

•Load•Index•Aggregation

•Load•Index•Aggregation

•Replication•Data Set Distribution

•Replication•Data Set Distribution

•Access & Analysis•Resource Scheduling & Distribution

•Access & Analysis•Resource Scheduling & Distribution

Meta DataMeta Data

System MonitoringSystem Monitoring

DataWarehouse

Over 80% of the work is here

Source: Enterprise GroupSource: Enterprise Group

StagingStagingAreaArea

Page 9: Presentation # 506

9

Warehouse Design

Normalized (Relational) DesignDimensional Design – Star and SnowflakeHybrid Design– In reality, the DW is more normalized but has

elements of dimensional design– The data marts are star schemas but have elements

of normalization

Page 10: Presentation # 506

10

Modelling is not straight forward

Donation

Member

IncomeCampaign

Time Gender

Marital Status

Location

Age

Page 11: Presentation # 506

11

Should These Be Combined?

Donation

Member

IncomeCampaign

Time Gender

Marital Status

Location

Age

Page 12: Presentation # 506

12

Behind The Scenes

There are several aspects of a design that users don’t directly see:– Meta Data– Error Correction– Audit– Load Control (if not using a scheduling tool)– Transformation Tables (used for transforming the

data prior to being loaded into the DW)

Page 13: Presentation # 506

13

Behind The Scenes

Data MartsData Marts

DataWarehouse

Error Correction

Meta Data

Audit

Load Control

Transform Tables

Source OLTPSource OLTPSystemsSystems

StagingStagingAreaArea

Page 14: Presentation # 506

14

A 10 Step Design Process

1. Identify major subject areas or topics2. Declare the Grain3. Add element of time to the tables4. Create appropriate names for tables, columns, and views5. Add derived fields where applicable6. Add administrative fields7. Consider security and privacy in design8. Make sure data model answers the critical business questions9. Consider meta data10. Consider error correction11. Performance considerations: Tune, Tune, Tune

Page 15: Presentation # 506

15

Independent of Approach…

…the goal of the data model is to satisfy two primary criteria:

1. Meet Business Objectives2. Provide Good Performance

Page 16: Presentation # 506

16

Staging Area

Page 17: Presentation # 506

17

Staging Area

Source OLTPSource OLTPSystemsSystems Data MartsData Marts

•Design•Mapping

•Design•Mapping

•Extract•Scrub•Transform

•Extract•Scrub•Transform

•Load•Index•Aggregation

•Load•Index•Aggregation

•Replication•Data Set Distribution

•Replication•Data Set Distribution

•Access & Analysis•Resource Scheduling & Distribution

•Access & Analysis•Resource Scheduling & Distribution

Meta DataMeta Data

System MonitoringSystem Monitoring

• Raw Detail• No/Minimal History

• Integrated•Scrubbed

• History•Summaries

• Targeted• Specialized (OLAP)

Data Characteristics

DataWarehouse

Source: Enterprise GroupSource: Enterprise Group

StagingStagingAreaArea

Page 18: Presentation # 506

18

The Staging Area

Holds a mirror copy of the extract filesAllows pre-processing of the data before loadingAllows easier reloading (you WILL do this)Keeps more control with the DW team, rather than with an external group (the extract team)

Page 19: Presentation # 506

19

The Staging Area

Facilitates easier audit processesCan facilitate error correction processesHelps identifying the Record Type (translates into easier ETL processing and logic)

Page 20: Presentation # 506

20

Surrogate Keys

Page 21: Presentation # 506

21

Surrogate Keys

A surrogate key is a system generated, unintelligent, single column, unique identifier for each row within a tableAlways use surrogate keys for dimensionsAlways use surrogate keys for the time dimensionAlways use surrogate keys for transformation tablesAlways use surrogate keys for EVERY table..and this includes FACT tables

Page 22: Presentation # 506

22

Surrogate Keys Avoid…

Duplicate keys from different source systemsRecycling of primary keysUse of the same key for different business rowsLengthy composite key joinsSpace in fact tablesApplication changes or upgrades in source systems

Page 23: Presentation # 506

23

Using Surrogates In Fact Tables

You will need a surrogate key on the fact table if you allow ‘unknown’ values into the fact table (which is recommended by the way)The Primary Key of a fact table is typically the combination of the base dimensions

Page 24: Presentation # 506

24

Surrogates In Fact Tables

DIM_DATES_OF_FIRST_SERVICE

Date_Of_First_Service_Key: NUMBER(10,0)

DIM_ICD9_PRIMARY_DIAGNOSES

Primary_Diagnosis_key: NUMBER(10,0)

DIM_BENEFIT_PACKAGES

Benefit_package_key: NUMBER(10,0)

DIM_MEMBERS

Member_key: NUMBER(10,0)

DIM_SERVICE_PROVIDERS

Provider_key: NUMBER(10,0)

FCT_CLAIMS

Product_Key: NUMBER(10,0)Primary_Diagnosis_key: NUMBER(10,0)Date_Of_First_Service_Key: NUMBER(10,0)Provider_key: NUMBER(10,0)Contract_key: NUMBER(10,0)Member_key: NUMBER(10,0)Benefit_package_key: NUMBER(10,0)

DIM_CONTRACTS

Contract_key: NUMBER(10,0)

DIM_PRODUCTS

Product_Key: NUMBER(10,0)

Page 25: Presentation # 506

25

Surrogates In Fact Tables

Date Of First Service 15-Jan-2001Benefit Package Family, Eye CoverageContract 123456789Product ExtendaGroupMember David StanfordService Provider Dr. WaltersPrimary Diagnosis Broken ArmAmount $123.34

Page 26: Presentation # 506

26

Surrogates In Fact Tables

Date Of First Service 15-Jan-2001Benefit Package Family, Eye CoverageContract 123456789Product ExtendaGroupMember David StanfordService Provider Dr. WaltersPrimary Diagnosis MISSING (Broken Arm)Amount $16,239.00

Page 27: Presentation # 506

27

Date Of First Service 15-Jan-2001Benefit Package Family, Eye CoverageContract 123456789Product ExtendaGroupMember David StanfordService Provider Dr. WaltersPrimary Diagnosis MISSING (Heart Attack)Amount $16,239.00This results in a duplicate primary key in the table

Surrogates In Fact TablesClaim Line Key 999999999

Page 28: Presentation # 506

28

Surrogates In Fact Tables

DIM_DATES_OF_FIRST_SERVICE

Date_Of_First_Service_Key: NUMBER(10,0)

DIM_ICD9_PRIMARY_DIAGNOSES

Primary_Diagnosis_key: NUMBER(10,0)

DIM_BENEFIT_PACKAGES

Benefit_package_key: NUMBER(10,0)

DIM_MEMBERS

Member_key: NUMBER(10,0)

DIM_SERVICE_PROVIDERS

Provider_key: NUMBER(10,0)

FCT_CLAIMS

Claim_Line_Key: NUMBER(10,0)

DIM_CONTRACTS

Contract_key: NUMBER(10,0)

DIM_PRODUCTS

Product_Key: NUMBER(10,0)

Thus the need for a surrogate primary key

Page 29: Presentation # 506

29

Load “Dirty” Data Into The Fact

Ties out to source systemsGains credibility with end usersRequires a few design resolutions:– Bad & Missing (BAM) Logic– Surrogate Keys in the Fact tables

Still 100% accurate – we don’t load the bad values, we identify the bad values for correction laterEmpowers the End Users to decide if the “dirty” data will invalidate their analysis

Page 30: Presentation # 506

30

Tracking History

Page 31: Presentation # 506

31

Tracking History in Dimensions

Type 1 – No historyType 2 – All historyType 3 – Some history

Page 32: Presentation # 506

32

Type 1 – No HistorySource Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.

Warehouse Transaction #1Key 100Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.Date 01-Jan-2001

Page 33: Presentation # 506

33

Type 1 – No History

Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.

Warehouse Transaction #2Key 100Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.Date 15-Mar-2001

Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.

Page 34: Presentation # 506

34

Type 2 – All History

Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.

Warehouse Transaction #1

Key 100

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Date 01-Jan-2001Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.

Warehouse Transaction #2

Key 101

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Date 15-Mar-2001

Page 35: Presentation # 506

35

Type 3 – Some History

Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.

Warehouse Transaction #1Key 100Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockOriginal Salutation

Ms.

Salutation Ms.Date 01-Jan-2001

Page 36: Presentation # 506

36

Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.

Warehouse Transaction #1Key 100Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitOriginal Salutation

Ms.

Salutation Mrs.Date 15-Mar-2001

Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.

Type 3 – Some History

Page 37: Presentation # 506

37

More Dimension Types…Combinations

Type 3 Prime – Types 1 and 2 (the most common)Type 4 – Types 1 and 3Type 5 – Types 2 & 3Type 6 – Types 1, 2, and 3 (the second most common)

Page 38: Presentation # 506

38

Trigger Fields

Trigger Fields are fields within a table that you want to track historyNon-Trigger fields are those which you do not want to track history

Page 39: Presentation # 506

39

Type 3 Prime –All and No History

Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.Expiry Date

Null

Page 40: Presentation # 506

40

Non Trigger Field UpdateSource Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.

Warehouse Transaction #1

Key 100

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Ms.

Date 15-Mar-2001

Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Ms.

Page 41: Presentation # 506

41

Trigger & Non Trigger Field Update

Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.

Warehouse Transaction #1

Key 100

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Date 01-Jan-2001Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.

Warehouse Transaction #2

Key 101

Id 1

Name Sandy Rubble

Address 42 Slate Ave

City GravelPit

Salutation Mrs.

Date 15-Mar-2001

Page 42: Presentation # 506

42

Changes One At A Time

Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Ms.

Warehouse Transaction #1Key 100Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Ms.Date 01-Jan-2001

Source Transaction #3Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.

Warehouse Transaction #2

Key 101Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.Date 15-Mar-2001

Page 43: Presentation # 506

43

Expect To Track Everything

Users want to view the data as it was when the transaction or event occurred

AND…

Users want to view the data in the context of today’s realities

THUS, model for both!

Page 44: Presentation # 506

44

Add ‘Current’ ColumnsIn order to provide these two views, consider adding ‘current’ columns to tables. This is a special Type 6.These fields get updated in historical records when a trigger field changes value in the current record.This simplifies the use of the DW by the usersIt’s easier to understand than having to write complex SQL

Page 45: Presentation # 506

45

Type 6 – All, Some, and No History

Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.

Warehouse Transaction #1

Key 100

Id 1

Name Sandy Rubble

Address 23 Boulder Rd

City Bedrock

Salutation Ms.

Current Sal’n Mrs.

Date 01-Jan-2001Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.

Warehouse Transaction #2Key 101Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.Current Sal’n Mrs.Date 15-Mar-2001

Page 46: Presentation # 506

46

Most Recent Flag

Tracks the Most Recent record in time (not loaded, but based on a time series)Should be added to the dimensions as a Yes/No (1/0) fieldThe most recently loaded record is set to Yes, all other records are set to NoAllows user to restrict on the Most Recent Flag to get a view of the world today

Page 47: Presentation # 506

47

Double Keying Type 2 Dimensions

Double surrogate key in Type 2 dimensions1 key is unique for each individual row1 key is unique for each individual business keyProtects against:– Authoritative source system changes / duplication

Page 48: Presentation # 506

48

Rapidly Changing Dimensions

Rapidly Changing Dimensions (RCD’s) need to be partitioned– Use Oracle partitioning– Include the native partition key in the dimension– Or split into several tables

Page 49: Presentation # 506

49

BAM Rules, Audit & Administrative Fields

Page 50: Presentation # 506

50

Bad & Missing Fact Data

Bad and/or missing data will be always be an issueThe source data is never completely cleanThere are always exceptionsRecall that you need to tie back into the source systems for your audit, thus you must load this ‘incorrect’ data

Put the decisions into the hands of your users – don’t decide for them whether the data is good enough or notNeed to develop Bad & Missing (BAM) Rules

Page 51: Presentation # 506

51

BAM Rules

Used in the ETL process when loading data that references other tables (e.g. loading a fact table and looking up the dimension record)Need a series of rules to follow if the lookup failsCreate a set of ‘dummy’ records for each referenced table (for Referential Integrity purposes)In snapshots, may need a set of dummy records per snapshot period

Page 52: Presentation # 506

52

BAM Rules – Dummy Records-99 Error/Missing

-88 Not Available

-77 Acceptable Error

-66 Temporarily Not Available

-1 Not Applicable

A great hockey team!A great hockey team!

GretzkyGretzky

LindrosLindros

CoffeyCoffey

LemieuxLemieux

Bunny Bunny LarocqueLarocque

Page 53: Presentation # 506

53

Dummy Record Meanings

-99 A data element is missing or a lookup into another table cannot find a matching value (e.g. Missing foreign key). The source record is still loaded and the column value is set to –99.

-88 ‘Data not available’. This data element is not available from the source record.

-77 ‘Acceptable Errors’ that will not be corrected. This data element was invalid (set to -99) during the initial load and will not be corrected or reloaded.

-66 Data is temporarily not available. Usually used in a multiple pass loading process.

-1 ‘Not Applicable’. This data element is not required in the context of the record.

Page 54: Presentation # 506

54

Error Correction Process

An area that you can report from and reload fromHold or point to the original source record and be able to recreate it (the DW has lost the original value once tagged to a BAM rule)Can be one summary table with standard error typesFor more detail, create one error table for each target tableCreate a series of error flag columns in the error table indicating what went wrong

Page 55: Presentation # 506

55

Error Correction Model – Summary Mode

Error_type

error_type_cd: VARCHAR(2) NOT NULL

error_type_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP

Severity_Level

severity_cd: VARCHAR(3) NOT NULL

severity_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP ETL_ERROR

etl_load_key: INTEGER NOT NULL (FK)sys_load_col_name: VARCHAR(30) NOT NULLsource_name: VARCHAR(80) NOT NULL (FK)error_type_cd: VARCHAR(2) NOT NULL (FK)source_row_id: INTEGER

severity_cd: VARCHAR(3) NOT NULL (FK)

Page 56: Presentation # 506

56

Error Correction Model – Detail Mode

Stage

Target

LoadProcessSource

Reload

ErrorExists

Page 57: Presentation # 506

57

Audit Considerations

A key area that is quite often ignoredYou must match to the source systems or be able to explain the differencesAuditing data loads (when did we start a load and what is the status?)

Without proof, you will not get the credibility!

Page 58: Presentation # 506

58

Audit ModelETL_AUDIT

etl_load_key: INTEGER NOT NULL

academic_yr: CHAR(9)prev_etl_load_key: INTEGERmost_rcnt_fy_ind: CHAR NOT NULLsystem_cd: VARCHAR(5) NOT NULL (FK)load_status_flg: VARCHAR(12)load_type_flg: CHARstage_archvd_date: DATEwh_archvd_date: DATEstage_start_ts: TIMESTAMPwarehouse_start_ts: TIMESTAMPnum_rows_read: INTEGERfct_cleanup_ind: CHARacad_yr_transt_ind: CHAR

ETL_Source_System

system_cd: VARCHAR(5) NOT NULL

system_name: VARCHAR(20)system_desc: VARCHAR(255)sys_req_file_cnt: INTEGER

ETL_AUDIT_TABLE_LOADS

etl_load_key: INTEGER NOT NULL (FK)source_name: VARCHAR(80) NOT NULL

num_rows_read: INTEGERnum_records_reqd: INTEGERload_status_flg: VARCHAR(12)extract_num: INTEGERextract_ts: TIMESTAMPstop_source_row_id: INTEGERload_session_name: VARCHAR(80)load_start_ts: TIMESTAMPload_stop_ts: TIMESTAMP

Page 59: Presentation # 506

59

Pulling Audit & Error Correction TogetherETL_AUDIT

etl_load_key: INTEGER NOT NULL

academic_yr: CHAR(9)prev_etl_load_key: INTEGERmost_rcnt_fy_ind: CHAR NOT NULLsystem_cd: VARCHAR(5) NOT NULL (FK)load_status_flg: VARCHAR(12)load_type_flg: CHARstage_archvd_date: DATEwh_archvd_date: DATEstage_start_ts: TIMESTAMPwarehouse_start_ts: TIMESTAMPnum_rows_read: INTEGERfct_cleanup_ind: CHARacad_yr_transt_ind: CHAR

ETL_Source_System

system_cd: VARCHAR(5) NOT NULL

system_name: VARCHAR(20)system_desc: VARCHAR(255)sys_req_file_cnt: INTEGER

ETL_AUDIT_TABLE_LOADS

etl_load_key: INTEGER NOT NULL (FK)source_name: VARCHAR(80) NOT NULL

num_rows_read: INTEGERnum_records_reqd: INTEGERload_status_flg: VARCHAR(12)extract_num: INTEGERextract_ts: TIMESTAMPstop_source_row_id: INTEGERload_session_name: VARCHAR(80)load_start_ts: TIMESTAMPload_stop_ts: TIMESTAMP

Error_type

error_type_cd: VARCHAR(2) NOT NULL

error_type_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP

Severity_Level

severity_cd: VARCHAR(3) NOT NULL

severity_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP ETL_ERROR

etl_load_key: INTEGER NOT NULL (FK)sys_load_col_name: VARCHAR(30) NOT NULLsource_name: VARCHAR(80) NOT NULL (FK)error_type_cd: VARCHAR(2) NOT NULL (FK)source_row_id: INTEGER

severity_cd: VARCHAR(3) NOT NULL (FK)

Page 60: Presentation # 506

60

Administrative Fields

Supports the ‘behind the scenes’ aspects– Loading– Querying

Different requirements for dimensions and factsBut try to standardize across all tables, even if the fields aren’t utilized today

Page 61: Presentation # 506

61

Dimension Tables

Record Type – indicates New, Trigger Field Modify, Non-Trigger Field Modify, Delete, CorrectionActive Flg - indicates a business key is activeMost Recent Flg - indicates the most recent row loaded within a business keyEffective Date - for the instance of that rowEnd Date - for the instance of that rowCreate DateUpdate DateCreate UserUpdate User

Page 62: Presentation # 506

62

Fact Tables

Record TypeActive FlgMost Recent FlgRow CntPartition Date – store the actual date valueCreate DateUpdate DateCreate UserUpdate User

Page 63: Presentation # 506

63

Administrative Field Values

Use 1’s and 0’s in flag and count fields – they’re easier to add (but it really depends on what the user can best understand)Always fill in date fields (use dummy start and end dates in time if needed)Use triggers to populate the create/update dates and users

Page 64: Presentation # 506

64

Other Tips “In The Bag”

Page 65: Presentation # 506

65

Random ThoughtsEnsure you secure…– Budget– Top management commitment

Have focus (scope definition)Develop incrementallyHave a business driven solutionUse experienced designers and implementersUse industry tools for development

Page 66: Presentation # 506

66

…More Random Thoughts

Generally, make all of your column names unique across tablesConform fact table measures (same name)Don’t normalize too much – jump right into a dimensional designAvoid retroactive changesDon’t be afraid of many dimensions

Page 67: Presentation # 506

67

Too Many Dimensions?1 Fact, 41 CONFORMED dimensions

DIM_PRODUCTS

DIM_ICD9_ADMITTING_DIAGNOSES

DIM_AUTHORIZING_PROVIDERS

DIM_PROVIDER_ROLES

DIM_HCP_CODES

DIM_CONTRACTS

DIM_PCP_PANELS

DIM_AGES

DIM_MR_CLASSIFICATIONS

DIM_PLACES_OF_SERVICE

DIM_SEXES

DIM_MARITAL_STATUSES

FCT_CLAIMS

DIM_SERVICE_PROVIDERS

DIM_MEMBERS

DIM_BENEFIT_PACKAGES

DIM_TIER_PLAN_TYPES

DIM_CORPORATIONS

DIM_EMPLOYER_GROUPS

DIM_MODIFIERS

DIM_ICD9_PRIMARY_DIAGNOSES

DIM_ICD9_SECONDARY_DIAGNOSES

DIM_MEMBER_LOCATIONS

DIM_PROVIDER_LOCATIONS

DIM_DATES_OF_FIRST_SERVICE

DIM_DATES_OF_LAST_SERVICE

DIM_PAID_DATES

DIM_CLAIM_RECEIVED_DATES

DIM_ADMISSION_DATES

DIM_CLAIM_INVOICE_DATES

DIM_DISCHARGE_DATES

DIM_REFERRING_PROVIDERS

DIM_PCPS

DIM_CPT4_CODES

DIM_HCPCS_CODES

DIM_REVENUE_CODES

DIM_ICD9_PROCEDURE_CODES

DIM_PCP_NETWORKS

DIM_MEDICAL_PCP_NETWORKS

DIM_SERV_PROV_NETWORKS

CLAIMS_DETAIL

DIM_DRG_CODES

Page 68: Presentation # 506

68

12 Common DW Design Mistakes(Intelligent Enterprise: Ralph Kimball Oct 2001)

1. Place text attributes in a fact table when you want to use them as constraints and groupings

2. Limit the use of verbose descriptions in your dimensions to save space

3. Split hierarchy and hierarchy levels into multiple dimension tables

4. Delay dealing with slowly changing dimensions 5. Use smart keys to join dimension and fact tables6. Add dimensions to fact tables before declaring the grain

Page 69: Presentation # 506

69

7. Declare that the dimensional model is based on a specific report

8. Mixing different grains in one fact table9. Leave lowest-level atomic data in non-dimensional

format10. Avoid building aggregates and use hardware for

performance improvements11. Fail to conform fact data12. Fail to conform dimension data

12 Common DW Design Mistakes(Intelligent Enterprise: Ralph Kimball Oct 2001)

Page 70: Presentation # 506

70

In Summary

Avoid “Dave’s Gotchya’s”Be careful in your designMeet business requirementsAddress the ‘behind the scenes’ issuesRemember: DW design is not a science, it is an art…so be creative!

Page 71: Presentation # 506

71

AQ&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S

David [email protected]

Thank You!