data warehouse

31
DATA WAREHOUSE

Upload: shwetabh-jaiswal

Post on 12-Jul-2015

228 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data warehouse

DATA WAREHOUSE

Page 2: Data warehouse

Data Warehouse • Pool of data to support decision making.• Structured to be available in ready to use form• Subject Oriented • Integrated • Time-variant• Nonvolatile• Additional characteristics like

1.Web based2.Relational/multidimensional3.Client/Server4.Real time5.Include metadata

Page 3: Data warehouse

Types of Data warehouseDATA Mart

• Dependent

– Created from warehouse

– Replicated • Functional subset of warehouse

• Independent

– Scaled down, less expensive version of data warehouse

– Designed for a department or SBU

– Organization may have multiple data marts• Difficult to integrate

Page 4: Data warehouse

• Operational DATA Stores: Provides a fairly recent form of customer information file(CIF)

• Enterprise DATA Warehouses: Used across the enterprise for decision support

• METADATA: Describes the structure of and meaning about data, contributing to their effective use.

Page 5: Data warehouse

Data warehousing process overview

Major components

• Data sources

• Data extraction

• Data loading

• Comprehensive database

• Metadata

• Middleware tools

Page 6: Data warehouse
Page 7: Data warehouse

Data Warehousing Architectures • May have one or more tiers

– Determined by warehouse, data acquisition (back end), and client (front end)

• One tier, where all run on same platform, is rare

• Two tier usually combines DSS engine (client) with warehouse– More economical

• Three tier separates these functional parts

Page 8: Data warehouse
Page 9: Data warehouse

Architecture considerations

• Which DBMS to use?

• Parallel processing

• Partitioning

• Which data migration tools be used?

• What tools for data retrieval and analysis?

Page 10: Data warehouse

Alternative Architectures for data warehousing

Page 11: Data warehouse

Architecture Selection Factors

• Information interdependence• Senior management Info needs• Urgency for a DW• Nature of end user tasks• Constraints on resources• Strategic view• Compatibility with existing systems• Ability of in-house IT staff• Technical and Political factors

Page 12: Data warehouse

Enterprise Data Warehouse

Page 13: Data warehouse

Data Integration, Extraction And Load process

1.DATA INTEGRATION

Comprises three major processes

• Data Access: ability to access & extract data from any data source

• Data federation: Integration of business views across multiple data store

• Change capture: Based on the identification, capture, and delivery of the changes made to enterprise data source.

Page 14: Data warehouse

2.Extraction, Transformation And Load(ETL)

• Is an integral component in any data-centric project.

• ETL consists:

Extraction-From all relevant sources

Transformation-Converting extracted data in the form so it can place in data warehouse or another database

Load- Inserting the data in the data warehouse.

Page 15: Data warehouse

ETL Process

Transient Data

source DataWarehouse

DataMart

Packagedapplication

Legacysystem

Extract

Other Internal

applications

Transform Cleanse Load

Page 16: Data warehouse

Benefits of Data Warehouse

• Allows extensive analysis in numerous ways.

• A consolidated view of corporate data.

• Better and more timely information.

• Enhance system performance.

• Simplification of data access.

• Enhance business knowledge, enhance customer service and satisfaction, facilitate decision making.

Page 17: Data warehouse

Assignment

• Data warehousing vendors?

• Data warehousing case study found on the internet.

Page 18: Data warehouse

Data Warehouse development Approaches

The Inmon Model: The EDW Approach

• Emphasizes top-down development

• Employing established database development methodologies and tools

The Kimball Model: The Data Mart Approach

• Plan big, build small

• Subject oriented or department oriented

• Focus on the requests of a specific department.

Page 19: Data warehouse
Page 20: Data warehouse

Data Warehouse Structure(The Star Schema)

Page 21: Data warehouse

Star Schema

• Most important means of implementation of dimensional analysis

• Central fact table surrounded by dimension tables

• Grain – highest level of detail that is supported.

• Drill down – probing beyond a summarisedvalue

Page 22: Data warehouse

DW – Implementation Issues

• Establishment of service-level agreements and data-refresh requirements.

• Identification of data sources and their governance policies.

• Data quality planning & model designing.

• ETL tool selection.

• Relational database software and platform selection.

• Data transport and data conversion.

• Reconciliation process

• End-user support

Page 23: Data warehouse

Issues in implementation of data warehouse

• Starting with the wrong sponsorship chain.

• Setting expectation that you cannot meet and frustrating executives at the moment of truth.

• Engaging in politically native behavior.

• Loading the warehouse with information just because it is available.

• Believing that data warehousing database design is the same as transactional database design.

Continue……..

Page 24: Data warehouse

• Choosing a data warehouse manager who is technology oriented rather than user oriented

• Focusing on traditional internal record-oriented data and ignoring the value of external data of text, image, and perhaps, sound and video.

• Delivering data with overlapping and confusing definitions.

• Believing promise of performance, capacity and scalability.

• Believing that your problem are over when the data warehouse is up and running.

Page 25: Data warehouse

Risks in Data Warehouse Projects• No mission or objective

• Quality of source data

unknown

• Skills not in place

• Inadequate budget

• Lack of supporting software

• Source data not understood

• Weak sponsor

• Users not computer literate

• Geographically distributed

environment

• Unrealistic user expectations

• Architectural and design risks

• Scope creep and changing requirements

• Vendors out of control

• Multiple platforms

• Key people leaving project

• Loss of the sponsor

• Too much new technology

• Having to fix an operational system

• Team geography and language culture

Page 26: Data warehouse

Massive Data Warehouse And Scalability

• Data warehouse needs scalability.

• Good scalability means: queries and other data access functions grow ideally with the size of warehouse.

• Specialized methods have been developed to create scalable data warehouse.

• Scalability is difficult in managing hundreds of terabytes.

Page 27: Data warehouse

Issues pertaining to scalability

• The amount of data in warehouse.

• How quickly the warehouse is expected to grow.

• The number of concurrent users.

• The complexity of user queries.

Page 28: Data warehouse

Real-Time Data warehousing

• Also knows as active data warehousing.

• Process of loading & providing data via the data warehouse.

• Evolved from EDW (Enterprise Data Warehousing)

concept.

• Allows information-based decision making at finger tips.

• Positively affect almost all aspects of customer service, SCM, logistics.

Page 29: Data warehouse

Comparison between Traditional And Active Data Warehousing Environment

Traditional Data Warehouse Environment

• Strategic decisions only

• Result sometimes hard to measure

• Moderate user concurrency

• Highly restrictive reporting used to confirm or check existing processes and patterns.

• Power users, knowledge workers, internal users.

Active Data Warehouse Environment

• Strategic and tactical decision

• Result measured with operations

• High number of users accessing simultaneously

• Flexible ad hoc reporting, as well as machine-assisted modeling to discover new hypotheses.

• Operational staffs, call centers, external users.

Page 30: Data warehouse

Data Warehouse Administration

• Due to huge size, data warehouse requires strong monitoring.

• A data warehouse administrator(DWA) should posses following features-

1. Should be familiar with high performance software, hardware, and networking tech.

2. Should familiar with decision making process.

3. Significant to keep the existing requirement and capabilities of data warehouse.

4. Must posses excellent communication skills.

Page 31: Data warehouse

Data Warehouse Security issues

• Security and privacy of information is significant concern.

• Companies must create effective and flexible security procedures.

• Effective security in data warehouse focus on:1. Establishing effective corporate and security policies and

procedures.2. Implementing logical security procedures and techniques to

restrict access.3. Limiting physical access to the data center environment.4. Establishing an effective internal control review process with

an emphasis on security and privacy.