dwh simplified

25

Upload: thomas-emmanuel

Post on 19-Jul-2016

20 views

Category:

Documents


3 download

DESCRIPTION

DWH principles

TRANSCRIPT

Page 1: DWH Simplified
Page 2: DWH Simplified

• Source Data Component•Production data•Internal data•Archive data•External data

• Data staging component•Extraction•Transformation

•Cleaning•standardization

•Loading• Data storage component• Information delivery component• Metadata component• Management and control component

Overview of the Components

Page 3: DWH Simplified

Architectural Framework

Page 4: DWH Simplified

Data AcquisitionYou are the data analyst on the project team building a DW for an insurance company. List the possible data sources from which you will bring data into DWProduction data: data from various operational systemsExternal data: for finding trends and comparisons against other organizations. Internal data: private confidential data important to an organizationArchived data:for getting some historical information

Page 5: DWH Simplified

Architectural Framework

Page 6: DWH Simplified

Data StagingPerforms ETL

Extraction Select data sources, determine filters Automatic replicate Create intermediary files

Transformation Clean, merge, de-duplicate data Covert data types Calculate derived data Resolve synonyms and homonyms

Loading Initial loading Incremental loading

Page 7: DWH Simplified

Why is a separate data staging area required?Data is across various operational databases It should be subject-oriented dataData staging is mandatory

Page 8: DWH Simplified

Architectural Framework

Page 9: DWH Simplified

Characteristics of data storage areaSeparate repositoryData content

Read onlyIntegratedHigh volumesGrouped by business subjects

Metadata drivenData from DW is aggregated in MDDBs

Page 10: DWH Simplified

Architectural Framework

Page 11: DWH Simplified

Information delivery componentDepends on the user

Novice user: prefabricated reports, preset queries

Casual user: once in a while information business analyst: complex analysisPower users: picks up interesting data

Page 12: DWH Simplified

Information delivery component

Page 13: DWH Simplified

Architectural Framework

Page 14: DWH Simplified

Metadata componentData about data in the datawarehouseMetadata can be of 3 types

Operational metadata: contains information about operational data sources

Extraction and transformation metadata: Details pertaining to extraction frequencies, extraction methods, business rules for data extraction

End-user metadata: navigational map of DW

Page 15: DWH Simplified

Why is metadata especially important in a data warehouse? It acts as the glue that connects all parts of

the data warehouse. It provides information about the contents

and structures to the developers. It opens the door to the end-users and makes

the contents recognizable in their own terms.

Page 16: DWH Simplified
Page 17: DWH Simplified

Management and ControlSits on top of all components

Coordinates the services and activities within the DW

Controls the data transformation and transfer in DW storage

Page 18: DWH Simplified

Summing upData warehouse building blocks or

components are: source data, data staging, data storage, information delivery, metadata, and management and control.

In a data warehouse, metadata is especially significant because it acts as the glue holding all the components together and serves as a roadmap for the end-users.

Page 19: DWH Simplified

Doubts????????????????

Page 20: DWH Simplified
Page 21: DWH Simplified

Case study 1As a senior analyst on DW project of a large

retail chain, you are responsible for improving data visualization of the output results. Make a list of recommendations

Page 22: DWH Simplified
Page 23: DWH Simplified

Parallel processingPerformance of DW may be improved using

parallel processing with appropriate hardware and software options.

Parallel processing optionsSymmetric multiprocessingMassively parallel processingclusters

Page 24: DWH Simplified

DW with ERP packages

Page 25: DWH Simplified