infs 6510 – competitive intelligence systems
DESCRIPTION
INFS 6510 – Competitive Intelligence Systems. Normalization vs. Denormalization. Relational Databases (RDBMS). Collection of linked tables Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) Primary Key – column whose values make each record unique - PowerPoint PPT PresentationTRANSCRIPT
The Road to Denormalization
Starring Various“Denormalized” Celebrities
The Road to Denormalization
Before transactional data can be loaded into a Data Warehouse, the data must be Denormalized
Data Warehouse
TransxDataTransx
DataTransxData
Normalization
But before you can understand Denormalization, you must understand Normalization . . .
And to understand Normalization, you must understand Relational Databases
I’ve beenDenormalized!
Relational Databases
Collection of linked tables
Tables linked by Primary Key / Foreign Key relationships (Referential Integrity)
Primary Key – column whose values make each record unique in a parent table (e.g., Customer Number)
Foreign Key – column in child table that links to the Primary Key in the parent table
Relational DB Example
Cust # Cust Name100 Moe101 Larry102 Curly
Order # Prod# Qty Cust#1 QR22 1 1002 QR22 25 1003 SB56 3 102
CUSTOMER TABLE ORDER TABLE
Primary Key Foreign Key
“Parent” table . . . “Child” table . . .
Database Structure & Design
2 Approaches:
1. Optimize forData Capture
i.e., CapturingTransactions
2. Optimize forData Access
i.e., Queries & Reporting
Conflict
I loveconflict!
Approach #1: Optimize for Data Capture
To optimize for data capture, you must:• Eliminate redundancy of data (or else wasted space &
processing occurs)
• Ensure data integrity (or else data anomalies)
• Ensure that changes in data (modifications, deletions, etc. only have to happen in one place)
Normalization – process by which a database is optimized for data capture• All data “redundancy” is removed from Database
• Has multiple forms (0, 1st, 2nd, 3rd, et al.)
Moving from 0NF to 1NFRule: Make a separate table for each set of related attributes, and make each field atomic (i.e., cannot be broken apart any further)
Cust # CustName100, 101, 102 Moe Howard,
Larry Fine, Curly Howard
CUSTOMER DATA
ONF
1NFCust # FName LName100 Moe Howard101 Larry Fine102 Curly Howard
CUSTOMER TABLE
I’M NOTMOVING!
Moving from 1NF to 2NFRule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign)
Cust # FName Order#100 Moe 1100 Moe 2101 Larry 3
TABLE X
1NF
Cust # FName100 Moe101 Larry102 Curly
Order # Cust#1 1002 1003 101
CUSTOMER TABLE ORDER TABLE
2NF
100 Moe100 Moe
Dependency on Primary Key
Moving from 2NF to 3NFRule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column)
Cust # City Order# ShipTime100 NY 1 2 days101 NY 2 2 days102 LA 3 5 days
TABLE X
2NF NY 2 daysNY 2 days
Dependency b/t 2 non-key columns
City # City ShipTime10 NY 2 days20 LA 5 days
Cust # City#100 10101 10102 20
SHIP TIME TABLE CUSTOMER TABLE
3NF
Normalized DB Example
11
MANY database tablesensure against redundantdata (and help prevent data integrity issues)
Am I a good example of
“Normalized?”
Database Structure & Design
2 Approaches:
1. Optimize forData Capture
i.e., CapturingTransactions
2. Optimize forData Access
i.e., Queries & Reporting
Conflict
I likeconflict too!
Approach #2: Optimize for Data Access(in a separate, read-only Data Warehouse)
To optimize for data access, you must:• Change the data layout to a different structure
• Allow data redundancy
• Reduce the number of table joins (i.e., reduce links among tables by combining tables)
Denormalizing – Adding redundancy & reducing joins in a relational database
Denormalizing – Most Common Approach
Star Schema (Clustering)• Fact (core or transaction) Tables in middle of star
• Dimensional (structural or “lookup”) Tables around “points” of star
Order # Date Cust# Prod# Loc#1 06/15/XX 100 QR22 10002 07/19/XX 100 QR22 10003 08/30/XX 101 SR56 2000
SALES ORDER (FACT) TABLE
Cust # CustName100 Moe101 Larry102 Curly
CUSTOMER DIMENSIONTABLE
Prod # ProdNameQR22 RakeSR56 SpadeTW43 Mulch
PRODUCT DIMENSIONTABLE
Loc # LocName1000 NY2000 LA3000 PGH
LOC DIMENSIONTABLE
Date Quarter06/29/XX 2 Bob06/30/XX 2 Sue07/01/XX 3
DATE DIMENSIONTABLE
These 2 tables become the “SALES FACT” table in the Data Warehouse
These 3 tablesbecome the
“Customer Dimension”
These 5 tables become the
“Product Dimension”
This Date Field helpsbuild the “Date
Dimension”
Resulting Star Schema Data Warehouse
Order # Date Cust# Prod# Rep#1 06/15/XX 100 QR22 10002 07/19/XX 100 QR22 10003 08/30/XX 101 SR56 2000
SALES ORDER (FACT) TABLE
Cust # CustName100 Moe101 Larry102 Curly
CUSTOMER DIMENSION
Prod # ProdNameQR22 RakeSR56 SpadeTW43 Mulch
PRODUCT DIMENSION
Date Quarter06/29/XX 2 Bob06/30/XX 2 Sue07/01/XX 3 Juan
DATE DIMENSION
It’s a STAR,Like me!
Common (Conformed) Dimensions
Denormalizing (continued)Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse
Prod# ProdName Stock Date Units QR22 Rake 03/23/XX 150 TW43 Mulch 04/15/XX 1452 SR56 Spade 05/01/XX 997
INVENTORY (FACT) TABLE
ORDER TABLE
Cust # CustName100 Moe101 Larry102 Curly
CUSTOMER DIMENSION
Prod # ProdNameQR22 RakeSR56 SpadeTW43 Mulch
PRODUCT DIMENSION
Loc # LocName1000 NY2000 LA3000 PGH
LOC DIMENSION
CUSTOMER TABLETIME
Order # Date Cust# Prod# Loc#1 06/15/XX 100 QR22 10002 07/19/XX 100 QR22 10003 08/30/XX 101 SR56 2000
Date Quarter06/29/XX 206/30/XX 2 S07/01/XX 3 Juan
SALES ORDER (FACT) TABLE
DATE DIMENSION
Mapping Normalized Tables to Denormalized (Data Warehouse) TablesUsing ETL Tools (like MS-SSIS)
These are 2 NormalizedTransaction Tables
EXTRACT
The data are “Transformed”in these steps
TRANSFORM
This is the resulting,Denormalized
Product Dimension
LOAD
The End
That’s all!Bye, bye!