components and architecture cs 543 – data warehousing

29
Components and Architecture CS 543 – Data Warehousing

Post on 20-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Components and Architecture CS 543 – Data Warehousing

Components and Architecture

CS 543 – Data Warehousing

Page 2: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 2

Architecture

What are the key components of a data warehouse? Architecture is the structure that binds the components

into an integrated whole DW architecture provides the overall framework for

developing and deploying DW solutions

Page 3: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 3

Architectural Components

Page 4: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 4

Distinguishing Characteristics

Different objectives and scope Data content Complex analysis and quick response Flexible and dynamic Metadata driven

Page 5: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 5

Architecture Supporting Flow of Data

Page 6: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 6

Technical Architecture

The technical architecture of a DW is the complete set of functions and services provided within its components Functions Services Rules and procedures Data stores

Tools are the means to implement an architecture Architecture comes first, then the tools; select the appropriate

tools based on the architecture

Page 7: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 7

Data Acquisition (1)

This component includes Extraction Transfer into staging area Preparation for loading (transformation, cleansing, and

integration)

Page 8: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 8

Data Acquisition (2)

Page 9: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 9

Data Acquisition – Functions and Services (1)

Data extraction Select data sources and determine the types of filters to apply to

individual sources Generate automatic extract files from operational systems using

replication and other techniques Create intermediary files to store selected data to be merged later Transport extracted files from multiple platforms Provide automated job control services for creating extract files Reformat input from outside sources, departmental files, databases, and

spreadsheets Resolve inconsistencies for common data elements from multiple sources Generate common application code for data extraction

Page 10: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 10

Data Acquisition – Functions and Services (2)

Data transformation Map input data to data for DW repository Clean data, remove duplicates, merge/purge De-normalize extracted data structures as required by the

dimensional model of the DW Convert data types Calculate and derive attribute values Check for referential integrity Aggregate data as needed Resolve missing values Consolidate and integrate data

Page 11: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 11

Data Acquisition – Functions and Services (3)

Data staging Provide backup and recovery for staging area repository Sort and merge files Create files as input to make changes to dimension tables If staging area storage is a relational database, create and

populate database

Page 12: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 12

Data Storage

This architectural component covers the process of loading the prepared data from the data staging area into the data warehouse repository

Page 13: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 13

Data Storage – Functions and Services

Load data for full refreshes of DW tables Perform incremental loads at regular prescribed intervals Support loading into multiple tables at the detailed and

summarized levels Optimize the loading process Provide automated job control services for loading the data

warehouse Provide backup and recovery for the DW database Provide security Monitor and fine-tune the database Periodically archive data from the database according to preset

conditions

Page 14: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 14

Information Delivery (1)

This architectural component spans a broad spectrum of many different methods of making information available to the users of the DW

To the users, information delivery is the DW; it is the front-end through which the users retrieve information from the DW

Information Online queries and interactive analyses Regular and ad-hoc reports Specialized applications (e.g. executive information system) Data mining

Page 15: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 15

Information Delivery (2)

Page 16: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 16

Information Delivery – Functions and Services

Provide security to control information access Monitor user access to improve service and for future enhancements Allow users to browse data warehouse content Simplify access by hiding internal complexities of data storage from users Automatically reformat queries for optimal execution Enable queries to be aware of aggregate tables for faster results Govern queries and control runaway queries Provide self-service report generation for users Store result sets for queries and reports for future use Provide multiple levels of data granularity Provide event triggers to monitor data loading Make provision for the users to perform complex analysis Enable data feeds to downstream, specialized data support systems such as

EIS and data mining

Page 17: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 17

Infrastructure Supporting Architecture

The architecture defines the functions and services; the infrastructure defines the elements to support the architecture

Infrastructure is the foundation supporting the architecture Hardware servers OSs Data management systems Networking elements Supporting tools and applications People Procedures

Page 18: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 18

Operational Infrastructure

Operational infrastructure includes People Procedures Training Management software

Operational infrastructure are the people and procedures that keep the DW functioning, and not those who develop the DW

Page 19: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 19

Physical Infrastructure (1)

Page 20: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 20

Physical Infrastructure (2)

Physical infrastructure includes Computing hardware (e.g. server) OS and utilities Networking hardware and software Software tools

Decisions about the physical infrastructure are critical for a DW. Two principles Leverage as much of the existing physical infrastructure Keep the infrastructure as modular as possible

Page 21: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 21

Hardware and Operating System

Hardware Scalability Support Vendor reference Vendor stability

Operating system Compatibility Scalability Security Reliability Availability Preemptive multitasking Multi-threaded approach Memory protection

Page 22: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 22

Single Platform Option

Simplest option, where all functions and services are performed by a single computing platform

Typically used by small to medium sized companies who have mainframes or large Unix servers already in use with capacity to spare

Some shortcomings of using mainframes Stretched to capacity Non availability of tools Multiple legacy platforms Company’s migration policy

Page 23: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 23

Hybrid Option

Most companies opt for the hybrid option where multiple platforms are used for data warehousing (data acquisition, data storage, information delivery)

Page 24: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 24

Data Extraction

Data extraction Best performed on each source system’s own computing platform

Initial reformatting and merging Best performed on each source system’s own computing platform Extract files are reformatted and merged into a smaller number of files

performing verification against the source system Initial data cleansing

Also performed on source system platform Transformation and consolidation

Performed on the staging area platform Validation and final quality check

Performed on the staging area platform Creation of load images

Performed on the staging area platform

Page 25: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 25

Options for the Data Staging Area

In one of the legacy platforms On the data storage platform On a separate optional platform

You can optimize the platform for complex transformations and cleaning

Install specialized tools for transformations and cleaning Keep track of entire data content in the staging area

Page 26: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 26

Data Movement

Page 27: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 27

Client/Server Architecture (1)

Page 28: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 28

Client/Server Architecture (2)

Application server (middle tier) To run middleware and establish connectivity To execute management and control software To handle data access from the Web To manage metadata For authentication As front end For managing and running standard reports For sophisticated query management For OLAP applications

Page 29: Components and Architecture CS 543 – Data Warehousing

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS 29

Maturing of the Infrastructure