data warehouse and data mining - mahoto, naeem ahmeddata warehouse • a data warehouse is a...

25
Naeem Ahmed Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Email: [email protected] Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture

Upload: others

Post on 29-Mar-2021

5 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Naeem Ahmed

Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

Email: [email protected]

Data Warehouse and Data Mining Lecture No. 04-06

Data Warehouse Architecture

Page 2: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Warehouse

Operational Data

Data Warehouse

Access Tools

End Users

Page 3: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Warehouse •  A data warehouse is a central, enterprise-wide

database which contains information extracted from the operational data stores.

•  Operational Systems: A system which is used to process the day-to-day transactions of an organization.

Page 4: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Warehouse Architecture

Page 5: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Warehouse Architecture

Page 6: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Operational Source systems •  These are the operational systems of record that

capture the transactions of the business. •  These systems are outside the data warehouse

which do not have control over contents and format of the data

•  The source systems maintain little historical data •  These systems generate operation data that is

detailed, current and subject to change

Page 7: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Staging Area •  Data staging area can be divided into three phases

–  Extraction (E) –  Transformation (T) –  Loading (L)

•  Extraction: It means reading and understanding the source data and copying the data needed for the data warehouse into staging area for further manipulation (i.e. transformation)

Page 8: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Staging Area •  Loading: Loading refers to populating of data

warehouse with data that has been extracted from operational systems.

•  There are two types of loads, which generally take place in data warehouse environment: –  Initial load –  Incremental load

Page 9: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Staging Area •  Transformation: The transformation phase applies

a series of rules or functions to the extracted/loaded data.

•  This may include some or all of the following: –  Select only certain columns to load (or if you prefer, null columns

not to load) –  Translate coded values –  Derive a new calculated value (e.g. sale_amount = qty * unit_price) –  Denormalization in order to fit the Dawarehouse Schema –  Summarize multiple rows of data (e.g. total sales for each region)

Page 10: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

ETL versus ELT •  ETL (The traditional approach): ETL (Extract, transform,

and load) is a process in data warehousing that involves: –  Extracting data from outside sources –  transforming it to fit business needs, and ultimately –  loading it into the data warehouse

•  ELT (The Teradata Approach): ELT (Extract, Load and Transform) strategy extracts and loads the data into a Teradata Database first, then uses the power and performance of the Teradata Warehouse to perform the transformation

Page 11: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Presentation Area •  Extended Relational DBMS

(ROLAP servers) –  data stored in RDB –  star-join schemas –  support SQL extensions (Cube) –  Index structures (bitmap, join)

•  Multidimensional DBMS (MOLAP servers) –  data stored in arrays (n-dimensional

array) –  direct access to array data structure –  poor storage utilization, especially

when the data is sparse

Page 12: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data Access Tools •  Analysis / OLAP / DSS Tools

•  Querying / Reporting Tools

•  Data Mining

Page 13: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Data warehouse bus architecture

Page 14: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Warehouse components

Page 15: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Component: Operational Data •  The sources of data for the data warehouse is

supplied from: –  The data from the mainframe systems in the traditional

network and hierarchical format –  Data can also come from the relational DBMS like

Oracle, Informix –  In addition to these internal data, operational data also

includes external data obtained from commercial databases and databases associated with supplier and customers

Page 16: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Component: Load Manager •  The load manager (also called the front end

component) performs all the operations associated with extraction and loading data into the data warehouse

•  These operations include simple transformations of the data to prepare the data for entry into the warehouse

•  The size and complexity of this component will vary between data warehouses and may be constructed using a combination of vendor data loading tools and custom built programs

Page 17: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Component: Warehouse Manager •  The warehouse manager performs all the operations

associated with the management of data in the warehouse This component is built using vendor data management tools and custom built programs

•  The operations performed by warehouse manager include: –  Analysis of data to ensure consistency –  Transformation and merging the source data from temporary

storage into data warehouse tables –  Create indexes and views on the base table. –  Generation of de-normalization –  Generation of aggregation –  Backing up and archiving of data

Page 18: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Warehouse Manager: Detailed Data

•  This area of the warehouse stores all the detailed data in the database schema

•  In most cases detailed data is not stored online but aggregated to the next level of details

•  However the detailed data is added regularly to the warehouse to supplement the aggregated data

Page 19: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Warehouse Manager: Lightly and Highly summarized data

•  The area of the data warehouse stores all the predefined lightly and highly summarized (aggregated) data generated by the warehouse manager

•  This area of the warehouse is transient as it will be subject to change on an ongoing basis in order to respond to the changing query profiles

•  The purpose of the summarized information is to speed up the query performance

•  The summarized data is updated continuously as new data is loaded into the warehouse

Page 20: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Warehouse Manager: Archive and Back-up Data

•  This area of the warehouse stores detailed and summarized data for the purpose of archiving and back-up

•  The data is transferred to storage archives such as magnetic tapes or optical disks

Page 21: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Warehouse Manager: Meta Data

•  The data warehouse also stores all the Meta data (data about data) definitions used by all processes in the warehouse

•  It is used for variety of purposed including: –  The extraction and loading process – Meta data is used to map data

sources to a common view of information within the warehouse. –  The warehouse management process – Meta data is used to

automate the production of summary tables. –  As part of Query Management process Meta data is used to direct a

query to the most appropriate data source. •  The structure of Meta data will differ in each process,

because the purpose is different

Page 22: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Component: Query Manager •  The query manager (also called the back end

component) performs all operations associated with management of user queries

•  This component is usually constructed using vendor end-user access tools, data warehousing monitoring tools, database facilities and custom built programs

•  The complexity of a query manager is determined by facilities provided by the end-user access tools and database

Page 23: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Component: End-user Access Tools

•  The principal purpose of data warehouse is to provide information to the business managers for strategic decision-making

•  These users interact with the warehouse using end user access tools

•  The examples of some of the end user access tools can be: –  Reporting and Query Tools –  Application Development Tools –  Executive Information Systems Tools –  Online Analytical Processing Tools –  Data Mining Tools

Page 24: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational

Warehouse Models and Operators

•  Data Models –  Relations –  stars & snowflakes –  Cubes

•  Operators –  Slice and dice –  roll-up, drill down –  pivoting –  other

Page 25: Data Warehouse and Data Mining - mahoto, naeem ahmedData Warehouse • A data warehouse is a central, enterprise-wide database which contains information extracted from the operational