ist722 data warehousing - syracuse...
TRANSCRIPT
![Page 1: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/1.jpg)
IST722 Data Warehousing
ETL Design and Development
Michael A. Fudge, Jr.
![Page 2: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/2.jpg)
Recall: Kimball Lifecycle
![Page 3: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/3.jpg)
Objective:Outline ETL design and development
process.A “Recipe” for ETL
![Page 4: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/4.jpg)
Before You BeginBefore you begin, you’ll need 1. Physical Design –
Star Schema implementation in ROLAP.
2. Architecture Plan – understanding of your DW/BI architecture.
3. Source to Target Mapping –Part of the detailed design process.
![Page 5: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/5.jpg)
The Plan• How the 34
subsystems map /related to the 10 step plan.
![Page 6: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/6.jpg)
Step 1 – Draw The High Level Plan
![Page 7: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/7.jpg)
Step 2 – Choose an ETL Tool
• Your ETL tool is responsible for moving data from the various sources into the data warehouse.
• Programming language vs. Graphical tool.• PL Flexibility, Customizable• GT Self Documenting, Easy for beginners
![Page 8: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/8.jpg)
ETL: Code vs Tool
Which of these is easier to understand?
![Page 9: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/9.jpg)
Step 3 – Develop Detailed Strategies
• Data Extraction & Archival of Extracted Data• Data quality checks on dimensions & facts• Manage changes to dimensions• Ensure the DW and ETL meet systems
availability requirements• Design a data auditing subsystem• Organize the staging data
![Page 10: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/10.jpg)
The Role of theStaging Database
• Staging database stores copies of source extracts• Can create a history when none exists.• Reduces unnecessary processing of data source.
Data Sources
StagingDatabase
Data Warehouse
EXTRACT LOAD
ETL:TRANSFORM
ELT:TRANSFORM
![Page 11: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/11.jpg)
Step 4 – Drill Down by Target Table
• Start drilling down into the detailed source to target flow for each target dimension and fact table
• Flowcharts and pseudo code are useful for building out your transformation logic.
• ETL Tools allow you to build and document the data flow at the same time:
![Page 12: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/12.jpg)
Step 5 – Populate Dimension Tables w/ Historic Data
• Part of the one-time historic processing step.• Start with the simplest dimension table
(usually type 1 SCD’s)• Transformations
o Combine from separate sourceso Convert data ex. EBCDIC ASCIIo Decode production codes ex. TTT Track-Type Tractoro Verify rollups ex: Category Producto Ensure a “Natural” or “Business” key exists for SCD’so Assign Surrogate Keys to Dimension table
![Page 13: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/13.jpg)
Step 6 – Perform the Fact Table Historic Load
• Part of the one-time historic processing step.• Transformations:
o Replace special codes (eg. -1) with NULL on additive and semi- additive facts
o Calculate and store complex derived facts ex: shipping amount is divided among the number of items on the order.
o Pivot rows into columns ex: account type, amount checking amount, savings amount
o Associate with Audit Dimensiono Lookup Dimension Keys using Natural/Business
Keys….
![Page 14: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/14.jpg)
Example Surrogate Key Pipeline
Handles SCD’s
![Page 15: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/15.jpg)
Step 7 – Dimension Table Incremental Processing
• Oftentimes the same logic used in the Historic load can be used.
• Identify New/ Changed data based on different attributes for the same natural keyo ETL tools usually can assist with this logic.
![Page 16: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/16.jpg)
Step 8 – Fact Table Incremental Processing
• A complex ETL:o Can be difficult to determine which facts need
to be processed?o What happens to a fact when it is re-processed?o What if a dimension key lookup fails?
• Some ETL tool assist with processing this logic.o Degenerate dimensions can be used ex:
transaction number in order summaryo A combination of dimension keys ex: StudentKey
and ClassKey for grade processing.
![Page 17: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/17.jpg)
Step 9 – Aggregate Table and OLAP Loads
• Further processing beyond the ROLAP star schema.
• Refresh / Reprocess o MOLAP cubeso INDEXED / MATERIALIZED viewso Aggregate summary tables
![Page 18: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/18.jpg)
Step 10 – ETL System Operation & Automation
• Schedule jobs• Catch and Log errors / exceptions• Database management tasks:
o Cleanup old datao Shrink Databaseo Rebuild indexeso Update Statistics
![Page 19: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/ist722fudge/old/units/09/ETL-Design-And... · Step 2 – Choose an ETL Tool • Your ETL tool is responsible for](https://reader031.vdocument.in/reader031/viewer/2022030407/5a85be837f8b9a001c8c87e4/html5/thumbnails/19.jpg)
IST722 Data Warehousing
ETL Design and Development
Michael A. Fudge, Jr.