presentation # 506
DESCRIPTION
IOUG Live! 2005. Design Tips for the Warehouse Architect. Presentation # 506. David Stanford President Red Sky Data Inc. [email protected]. Objectives. Obtain a clear understanding of data warehouse design ‘hot points’ Identify solutions and alternatives for these ‘hot points’ - PowerPoint PPT PresentationTRANSCRIPT
1
Presentation # 506David StanfordDavid Stanford
PresidentPresidentRed Sky Data Inc.Red Sky Data Inc.
[email protected]@redskydata.com
Design Tips for the Warehouse Design Tips for the Warehouse ArchitectArchitect
IOUG Live! 2005
2
Objectives
Obtain a clear understanding of data warehouse design ‘hot points’Identify solutions and alternatives for these ‘hot points’See how real world solutions are implemented
3
Agenda
Top 10 Gotchya’sDesign Traps– Loading Dirty Fact Data– Surrogate Keys– The Staging Area
Slowly Changing DimensionsTracking All HistoryAudit ConsiderationsBad & Missing DataAdministrative FieldsOther Tidbits of Advice
4
Dave’s Top 10 Gotchya’s
1. Failing to model for both a) view of the data when the event occurred and b) view of the data as of today’s reality
2. Limiting the number of dimensions3. Failing to model and populate a meta data
repository4. Failing to provide sufficient audit capabilities
to verify loads against source systems5. Not using surrogate keys for everything
5
6. Failing to design an error correction process7. Normalizing too much8. Not using a staging area9. Failing to load ALL of the fact data10.Failing to classify incorrect data11.Making it too complex!
Dave’s Top 10 Gotchya’s
6
Design Traps
Design ReviewStaging AreaSurrogate KeysFacts – Surrogates and Dirty Data
7
Data Warehouse Process
Source OLTPSource OLTPSystemsSystems Data MartsData Marts
•Design•Mapping
•Design•Mapping
•Extract•Scrub•Transform
•Extract•Scrub•Transform
•Load•Index•Aggregation
•Load•Index•Aggregation
•Replication•Data Set Distribution
•Replication•Data Set Distribution
•Access & Analysis•Resource Scheduling & Distribution
•Access & Analysis•Resource Scheduling & Distribution
Meta DataMeta Data
System MonitoringSystem Monitoring
• Raw Detail• No/Minimal History
• Integrated•Scrubbed
• History•Summaries
• Targeted• Specialized (OLAP)
Data Characteristics
DataWarehouse
Source: Enterprise GroupSource: Enterprise Group
StagingStagingAreaArea
8
Where The Work Is
Source OLTPSource OLTPSystemsSystems Data MartsData Marts
•Design•Mapping
•Design•Mapping
•Extract•Scrub•Transform
•Extract•Scrub•Transform
•Load•Index•Aggregation
•Load•Index•Aggregation
•Replication•Data Set Distribution
•Replication•Data Set Distribution
•Access & Analysis•Resource Scheduling & Distribution
•Access & Analysis•Resource Scheduling & Distribution
Meta DataMeta Data
System MonitoringSystem Monitoring
DataWarehouse
Over 80% of the work is here
Source: Enterprise GroupSource: Enterprise Group
StagingStagingAreaArea
9
Warehouse Design
Normalized (Relational) DesignDimensional Design – Star and SnowflakeHybrid Design– In reality, the DW is more normalized but has
elements of dimensional design– The data marts are star schemas but have elements
of normalization
10
Modelling is not straight forward
Donation
Member
IncomeCampaign
Time Gender
Marital Status
Location
Age
11
Should These Be Combined?
Donation
Member
IncomeCampaign
Time Gender
Marital Status
Location
Age
12
Behind The Scenes
There are several aspects of a design that users don’t directly see:– Meta Data– Error Correction– Audit– Load Control (if not using a scheduling tool)– Transformation Tables (used for transforming the
data prior to being loaded into the DW)
13
Behind The Scenes
Data MartsData Marts
DataWarehouse
Error Correction
Meta Data
Audit
Load Control
Transform Tables
Source OLTPSource OLTPSystemsSystems
StagingStagingAreaArea
14
A 10 Step Design Process
1. Identify major subject areas or topics2. Declare the Grain3. Add element of time to the tables4. Create appropriate names for tables, columns, and views5. Add derived fields where applicable6. Add administrative fields7. Consider security and privacy in design8. Make sure data model answers the critical business questions9. Consider meta data10. Consider error correction11. Performance considerations: Tune, Tune, Tune
15
Independent of Approach…
…the goal of the data model is to satisfy two primary criteria:
1. Meet Business Objectives2. Provide Good Performance
16
Staging Area
17
Staging Area
Source OLTPSource OLTPSystemsSystems Data MartsData Marts
•Design•Mapping
•Design•Mapping
•Extract•Scrub•Transform
•Extract•Scrub•Transform
•Load•Index•Aggregation
•Load•Index•Aggregation
•Replication•Data Set Distribution
•Replication•Data Set Distribution
•Access & Analysis•Resource Scheduling & Distribution
•Access & Analysis•Resource Scheduling & Distribution
Meta DataMeta Data
System MonitoringSystem Monitoring
• Raw Detail• No/Minimal History
• Integrated•Scrubbed
• History•Summaries
• Targeted• Specialized (OLAP)
Data Characteristics
DataWarehouse
Source: Enterprise GroupSource: Enterprise Group
StagingStagingAreaArea
18
The Staging Area
Holds a mirror copy of the extract filesAllows pre-processing of the data before loadingAllows easier reloading (you WILL do this)Keeps more control with the DW team, rather than with an external group (the extract team)
19
The Staging Area
Facilitates easier audit processesCan facilitate error correction processesHelps identifying the Record Type (translates into easier ETL processing and logic)
20
Surrogate Keys
21
Surrogate Keys
A surrogate key is a system generated, unintelligent, single column, unique identifier for each row within a tableAlways use surrogate keys for dimensionsAlways use surrogate keys for the time dimensionAlways use surrogate keys for transformation tablesAlways use surrogate keys for EVERY table..and this includes FACT tables
22
Surrogate Keys Avoid…
Duplicate keys from different source systemsRecycling of primary keysUse of the same key for different business rowsLengthy composite key joinsSpace in fact tablesApplication changes or upgrades in source systems
23
Using Surrogates In Fact Tables
You will need a surrogate key on the fact table if you allow ‘unknown’ values into the fact table (which is recommended by the way)The Primary Key of a fact table is typically the combination of the base dimensions
24
Surrogates In Fact Tables
DIM_DATES_OF_FIRST_SERVICE
Date_Of_First_Service_Key: NUMBER(10,0)
DIM_ICD9_PRIMARY_DIAGNOSES
Primary_Diagnosis_key: NUMBER(10,0)
DIM_BENEFIT_PACKAGES
Benefit_package_key: NUMBER(10,0)
DIM_MEMBERS
Member_key: NUMBER(10,0)
DIM_SERVICE_PROVIDERS
Provider_key: NUMBER(10,0)
FCT_CLAIMS
Product_Key: NUMBER(10,0)Primary_Diagnosis_key: NUMBER(10,0)Date_Of_First_Service_Key: NUMBER(10,0)Provider_key: NUMBER(10,0)Contract_key: NUMBER(10,0)Member_key: NUMBER(10,0)Benefit_package_key: NUMBER(10,0)
DIM_CONTRACTS
Contract_key: NUMBER(10,0)
DIM_PRODUCTS
Product_Key: NUMBER(10,0)
25
Surrogates In Fact Tables
Date Of First Service 15-Jan-2001Benefit Package Family, Eye CoverageContract 123456789Product ExtendaGroupMember David StanfordService Provider Dr. WaltersPrimary Diagnosis Broken ArmAmount $123.34
26
Surrogates In Fact Tables
Date Of First Service 15-Jan-2001Benefit Package Family, Eye CoverageContract 123456789Product ExtendaGroupMember David StanfordService Provider Dr. WaltersPrimary Diagnosis MISSING (Broken Arm)Amount $16,239.00
27
Date Of First Service 15-Jan-2001Benefit Package Family, Eye CoverageContract 123456789Product ExtendaGroupMember David StanfordService Provider Dr. WaltersPrimary Diagnosis MISSING (Heart Attack)Amount $16,239.00This results in a duplicate primary key in the table
Surrogates In Fact TablesClaim Line Key 999999999
28
Surrogates In Fact Tables
DIM_DATES_OF_FIRST_SERVICE
Date_Of_First_Service_Key: NUMBER(10,0)
DIM_ICD9_PRIMARY_DIAGNOSES
Primary_Diagnosis_key: NUMBER(10,0)
DIM_BENEFIT_PACKAGES
Benefit_package_key: NUMBER(10,0)
DIM_MEMBERS
Member_key: NUMBER(10,0)
DIM_SERVICE_PROVIDERS
Provider_key: NUMBER(10,0)
FCT_CLAIMS
Claim_Line_Key: NUMBER(10,0)
DIM_CONTRACTS
Contract_key: NUMBER(10,0)
DIM_PRODUCTS
Product_Key: NUMBER(10,0)
Thus the need for a surrogate primary key
29
Load “Dirty” Data Into The Fact
Ties out to source systemsGains credibility with end usersRequires a few design resolutions:– Bad & Missing (BAM) Logic– Surrogate Keys in the Fact tables
Still 100% accurate – we don’t load the bad values, we identify the bad values for correction laterEmpowers the End Users to decide if the “dirty” data will invalidate their analysis
30
Tracking History
31
Tracking History in Dimensions
Type 1 – No historyType 2 – All historyType 3 – Some history
32
Type 1 – No HistorySource Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.
Warehouse Transaction #1Key 100Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.Date 01-Jan-2001
33
Type 1 – No History
Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.
Warehouse Transaction #2Key 100Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.Date 15-Mar-2001
Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.
34
Type 2 – All History
Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.
Warehouse Transaction #1
Key 100
Id 1
Name Sandy Rubble
Address 23 Boulder Rd
City Bedrock
Salutation Ms.
Date 01-Jan-2001Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.
Warehouse Transaction #2
Key 101
Id 1
Name Sandy Rubble
Address 42 Slate Ave
City GravelPit
Salutation Mrs.
Date 15-Mar-2001
35
Type 3 – Some History
Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.
Warehouse Transaction #1Key 100Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockOriginal Salutation
Ms.
Salutation Ms.Date 01-Jan-2001
36
Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.
Warehouse Transaction #1Key 100Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitOriginal Salutation
Ms.
Salutation Mrs.Date 15-Mar-2001
Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.
Type 3 – Some History
37
More Dimension Types…Combinations
Type 3 Prime – Types 1 and 2 (the most common)Type 4 – Types 1 and 3Type 5 – Types 2 & 3Type 6 – Types 1, 2, and 3 (the second most common)
38
Trigger Fields
Trigger Fields are fields within a table that you want to track historyNon-Trigger fields are those which you do not want to track history
39
Type 3 Prime –All and No History
Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.Expiry Date
Null
40
Non Trigger Field UpdateSource Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.
Warehouse Transaction #1
Key 100
Id 1
Name Sandy Rubble
Address 42 Slate Ave
City GravelPit
Salutation Ms.
Date 15-Mar-2001
Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Ms.
41
Trigger & Non Trigger Field Update
Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.
Warehouse Transaction #1
Key 100
Id 1
Name Sandy Rubble
Address 23 Boulder Rd
City Bedrock
Salutation Ms.
Date 01-Jan-2001Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.
Warehouse Transaction #2
Key 101
Id 1
Name Sandy Rubble
Address 42 Slate Ave
City GravelPit
Salutation Mrs.
Date 15-Mar-2001
42
Changes One At A Time
Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Ms.
Warehouse Transaction #1Key 100Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Ms.Date 01-Jan-2001
Source Transaction #3Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.
Warehouse Transaction #2
Key 101Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.Date 15-Mar-2001
43
Expect To Track Everything
Users want to view the data as it was when the transaction or event occurred
AND…
Users want to view the data in the context of today’s realities
THUS, model for both!
44
Add ‘Current’ ColumnsIn order to provide these two views, consider adding ‘current’ columns to tables. This is a special Type 6.These fields get updated in historical records when a trigger field changes value in the current record.This simplifies the use of the DW by the usersIt’s easier to understand than having to write complex SQL
45
Type 6 – All, Some, and No History
Source Transaction #1Id 1Name Sandy RubbleAddress 23 Boulder RdCity BedrockSalutation Ms.
Warehouse Transaction #1
Key 100
Id 1
Name Sandy Rubble
Address 23 Boulder Rd
City Bedrock
Salutation Ms.
Current Sal’n Mrs.
Date 01-Jan-2001Source Transaction #2Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.
Warehouse Transaction #2Key 101Id 1Name Sandy RubbleAddress 42 Slate AveCity GravelPitSalutation Mrs.Current Sal’n Mrs.Date 15-Mar-2001
46
Most Recent Flag
Tracks the Most Recent record in time (not loaded, but based on a time series)Should be added to the dimensions as a Yes/No (1/0) fieldThe most recently loaded record is set to Yes, all other records are set to NoAllows user to restrict on the Most Recent Flag to get a view of the world today
47
Double Keying Type 2 Dimensions
Double surrogate key in Type 2 dimensions1 key is unique for each individual row1 key is unique for each individual business keyProtects against:– Authoritative source system changes / duplication
48
Rapidly Changing Dimensions
Rapidly Changing Dimensions (RCD’s) need to be partitioned– Use Oracle partitioning– Include the native partition key in the dimension– Or split into several tables
49
BAM Rules, Audit & Administrative Fields
50
Bad & Missing Fact Data
Bad and/or missing data will be always be an issueThe source data is never completely cleanThere are always exceptionsRecall that you need to tie back into the source systems for your audit, thus you must load this ‘incorrect’ data
Put the decisions into the hands of your users – don’t decide for them whether the data is good enough or notNeed to develop Bad & Missing (BAM) Rules
51
BAM Rules
Used in the ETL process when loading data that references other tables (e.g. loading a fact table and looking up the dimension record)Need a series of rules to follow if the lookup failsCreate a set of ‘dummy’ records for each referenced table (for Referential Integrity purposes)In snapshots, may need a set of dummy records per snapshot period
52
BAM Rules – Dummy Records-99 Error/Missing
-88 Not Available
-77 Acceptable Error
-66 Temporarily Not Available
-1 Not Applicable
A great hockey team!A great hockey team!
GretzkyGretzky
LindrosLindros
CoffeyCoffey
LemieuxLemieux
Bunny Bunny LarocqueLarocque
53
Dummy Record Meanings
-99 A data element is missing or a lookup into another table cannot find a matching value (e.g. Missing foreign key). The source record is still loaded and the column value is set to –99.
-88 ‘Data not available’. This data element is not available from the source record.
-77 ‘Acceptable Errors’ that will not be corrected. This data element was invalid (set to -99) during the initial load and will not be corrected or reloaded.
-66 Data is temporarily not available. Usually used in a multiple pass loading process.
-1 ‘Not Applicable’. This data element is not required in the context of the record.
54
Error Correction Process
An area that you can report from and reload fromHold or point to the original source record and be able to recreate it (the DW has lost the original value once tagged to a BAM rule)Can be one summary table with standard error typesFor more detail, create one error table for each target tableCreate a series of error flag columns in the error table indicating what went wrong
55
Error Correction Model – Summary Mode
Error_type
error_type_cd: VARCHAR(2) NOT NULL
error_type_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP
Severity_Level
severity_cd: VARCHAR(3) NOT NULL
severity_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP ETL_ERROR
etl_load_key: INTEGER NOT NULL (FK)sys_load_col_name: VARCHAR(30) NOT NULLsource_name: VARCHAR(80) NOT NULL (FK)error_type_cd: VARCHAR(2) NOT NULL (FK)source_row_id: INTEGER
severity_cd: VARCHAR(3) NOT NULL (FK)
56
Error Correction Model – Detail Mode
Stage
Target
LoadProcessSource
Reload
ErrorExists
57
Audit Considerations
A key area that is quite often ignoredYou must match to the source systems or be able to explain the differencesAuditing data loads (when did we start a load and what is the status?)
Without proof, you will not get the credibility!
58
Audit ModelETL_AUDIT
etl_load_key: INTEGER NOT NULL
academic_yr: CHAR(9)prev_etl_load_key: INTEGERmost_rcnt_fy_ind: CHAR NOT NULLsystem_cd: VARCHAR(5) NOT NULL (FK)load_status_flg: VARCHAR(12)load_type_flg: CHARstage_archvd_date: DATEwh_archvd_date: DATEstage_start_ts: TIMESTAMPwarehouse_start_ts: TIMESTAMPnum_rows_read: INTEGERfct_cleanup_ind: CHARacad_yr_transt_ind: CHAR
ETL_Source_System
system_cd: VARCHAR(5) NOT NULL
system_name: VARCHAR(20)system_desc: VARCHAR(255)sys_req_file_cnt: INTEGER
ETL_AUDIT_TABLE_LOADS
etl_load_key: INTEGER NOT NULL (FK)source_name: VARCHAR(80) NOT NULL
num_rows_read: INTEGERnum_records_reqd: INTEGERload_status_flg: VARCHAR(12)extract_num: INTEGERextract_ts: TIMESTAMPstop_source_row_id: INTEGERload_session_name: VARCHAR(80)load_start_ts: TIMESTAMPload_stop_ts: TIMESTAMP
59
Pulling Audit & Error Correction TogetherETL_AUDIT
etl_load_key: INTEGER NOT NULL
academic_yr: CHAR(9)prev_etl_load_key: INTEGERmost_rcnt_fy_ind: CHAR NOT NULLsystem_cd: VARCHAR(5) NOT NULL (FK)load_status_flg: VARCHAR(12)load_type_flg: CHARstage_archvd_date: DATEwh_archvd_date: DATEstage_start_ts: TIMESTAMPwarehouse_start_ts: TIMESTAMPnum_rows_read: INTEGERfct_cleanup_ind: CHARacad_yr_transt_ind: CHAR
ETL_Source_System
system_cd: VARCHAR(5) NOT NULL
system_name: VARCHAR(20)system_desc: VARCHAR(255)sys_req_file_cnt: INTEGER
ETL_AUDIT_TABLE_LOADS
etl_load_key: INTEGER NOT NULL (FK)source_name: VARCHAR(80) NOT NULL
num_rows_read: INTEGERnum_records_reqd: INTEGERload_status_flg: VARCHAR(12)extract_num: INTEGERextract_ts: TIMESTAMPstop_source_row_id: INTEGERload_session_name: VARCHAR(80)load_start_ts: TIMESTAMPload_stop_ts: TIMESTAMP
Error_type
error_type_cd: VARCHAR(2) NOT NULL
error_type_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP
Severity_Level
severity_cd: VARCHAR(3) NOT NULL
severity_desc: VARCHAR(255)last_update_ts: TIMESTAMP NOT NULLrecord_expiry_ts: TIMESTAMP ETL_ERROR
etl_load_key: INTEGER NOT NULL (FK)sys_load_col_name: VARCHAR(30) NOT NULLsource_name: VARCHAR(80) NOT NULL (FK)error_type_cd: VARCHAR(2) NOT NULL (FK)source_row_id: INTEGER
severity_cd: VARCHAR(3) NOT NULL (FK)
60
Administrative Fields
Supports the ‘behind the scenes’ aspects– Loading– Querying
Different requirements for dimensions and factsBut try to standardize across all tables, even if the fields aren’t utilized today
61
Dimension Tables
Record Type – indicates New, Trigger Field Modify, Non-Trigger Field Modify, Delete, CorrectionActive Flg - indicates a business key is activeMost Recent Flg - indicates the most recent row loaded within a business keyEffective Date - for the instance of that rowEnd Date - for the instance of that rowCreate DateUpdate DateCreate UserUpdate User
62
Fact Tables
Record TypeActive FlgMost Recent FlgRow CntPartition Date – store the actual date valueCreate DateUpdate DateCreate UserUpdate User
63
Administrative Field Values
Use 1’s and 0’s in flag and count fields – they’re easier to add (but it really depends on what the user can best understand)Always fill in date fields (use dummy start and end dates in time if needed)Use triggers to populate the create/update dates and users
64
Other Tips “In The Bag”
65
Random ThoughtsEnsure you secure…– Budget– Top management commitment
Have focus (scope definition)Develop incrementallyHave a business driven solutionUse experienced designers and implementersUse industry tools for development
66
…More Random Thoughts
Generally, make all of your column names unique across tablesConform fact table measures (same name)Don’t normalize too much – jump right into a dimensional designAvoid retroactive changesDon’t be afraid of many dimensions
67
Too Many Dimensions?1 Fact, 41 CONFORMED dimensions
DIM_PRODUCTS
DIM_ICD9_ADMITTING_DIAGNOSES
DIM_AUTHORIZING_PROVIDERS
DIM_PROVIDER_ROLES
DIM_HCP_CODES
DIM_CONTRACTS
DIM_PCP_PANELS
DIM_AGES
DIM_MR_CLASSIFICATIONS
DIM_PLACES_OF_SERVICE
DIM_SEXES
DIM_MARITAL_STATUSES
FCT_CLAIMS
DIM_SERVICE_PROVIDERS
DIM_MEMBERS
DIM_BENEFIT_PACKAGES
DIM_TIER_PLAN_TYPES
DIM_CORPORATIONS
DIM_EMPLOYER_GROUPS
DIM_MODIFIERS
DIM_ICD9_PRIMARY_DIAGNOSES
DIM_ICD9_SECONDARY_DIAGNOSES
DIM_MEMBER_LOCATIONS
DIM_PROVIDER_LOCATIONS
DIM_DATES_OF_FIRST_SERVICE
DIM_DATES_OF_LAST_SERVICE
DIM_PAID_DATES
DIM_CLAIM_RECEIVED_DATES
DIM_ADMISSION_DATES
DIM_CLAIM_INVOICE_DATES
DIM_DISCHARGE_DATES
DIM_REFERRING_PROVIDERS
DIM_PCPS
DIM_CPT4_CODES
DIM_HCPCS_CODES
DIM_REVENUE_CODES
DIM_ICD9_PROCEDURE_CODES
DIM_PCP_NETWORKS
DIM_MEDICAL_PCP_NETWORKS
DIM_SERV_PROV_NETWORKS
CLAIMS_DETAIL
DIM_DRG_CODES
68
12 Common DW Design Mistakes(Intelligent Enterprise: Ralph Kimball Oct 2001)
1. Place text attributes in a fact table when you want to use them as constraints and groupings
2. Limit the use of verbose descriptions in your dimensions to save space
3. Split hierarchy and hierarchy levels into multiple dimension tables
4. Delay dealing with slowly changing dimensions 5. Use smart keys to join dimension and fact tables6. Add dimensions to fact tables before declaring the grain
69
7. Declare that the dimensional model is based on a specific report
8. Mixing different grains in one fact table9. Leave lowest-level atomic data in non-dimensional
format10. Avoid building aggregates and use hardware for
performance improvements11. Fail to conform fact data12. Fail to conform dimension data
12 Common DW Design Mistakes(Intelligent Enterprise: Ralph Kimball Oct 2001)
70
In Summary
Avoid “Dave’s Gotchya’s”Be careful in your designMeet business requirementsAddress the ‘behind the scenes’ issuesRemember: DW design is not a science, it is an art…so be creative!
71
AQ&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S
David [email protected]
Thank You!