data warehouseold

29
DATA WAREHOUSE

Upload: shwetabh-jaiswal

Post on 29-Jul-2015

169 views

Category:

Technology


0 download

TRANSCRIPT

DATA WAREHOUSE

Data Warehouse • Pool of data to support decision making.• Structured to be available in ready to use form• Subject Oriented • Integrated • Time-variant• Nonvolatile• Additional characteristics like 1.Web based 2.Relational/multidimensional 3.Client/Server 4.Real time 5.Include metadata

Types of Data warehouseDATA Mart• Dependent– Created from warehouse– Replicated

• Functional subset of warehouse

• Independent– Scaled down, less expensive version of data warehouse– Designed for a department or SBU– Organization may have multiple data marts

• Difficult to integrate

• Operational DATA Stores: Provides a fairly recent form of customer information file(CIF)

• Enterprise DATA Warehouses: Used across the enterprise for decision support

• METADATA: Describes the structure of and meaning about data, contributing to their effective use.

Data warehousing process overview

Major components• Data sources• Data extraction• Data loading• Comprehensive database• Metadata• Middleware tools

Data Warehousing Architectures • May have one or more tiers– Determined by warehouse, data acquisition (back

end), and client (front end)• One tier, where all run on same platform, is rare• Two tier usually combines DSS engine (client) with

warehouse– More economical

• Three tier separates these functional parts

Architecture considerations

• Which DBMS to use? • Parallel processing• Partitioning• Which data migration tools be used?• What tools for data retrieval and analysis?

Alternative Architectures for data warehousing

Architecture Selection Factors

• Information interdependence• Senior management Info needs• Urgency for a DW• Nature of end user tasks• Constraints on resources• Strategic view• Compatibility with existing systems• Ability of in-house IT staff• Technical and Political factors

Enterprise Data Warehouse

Data Integration, Extraction And Load process

1.DATA INTEGRATIONComprises three major processes• Data Access: ability to access & extract data

from any data source• Data federation: Integration of business views

across multiple data store• Change capture: Based on the identification,

capture, and delivery of the changes made to enterprise data source.

2.Extraction, Transformation And Load(ETL)• Is an integral component in any data-centric

project.• ETL consists:Extraction-From all relevant sourcesTransformation-Converting extracted data in the

form so it can place in data warehouse or another database

Load- Inserting the data in the data warehouse.

ETL Process

Transient Data

source DataWarehouse

DataMart

Packagedapplication

Legacysystem Extract

Other Internal

applications

Transform Cleanse Load

Benefits of Data Warehouse• Allows extensive analysis in numerous ways.• A consolidated view of corporate data.• Better and more timely information.• Enhance system performance.• Simplification of data access.• Enhance business knowledge, enhance

customer service and satisfaction, facilitate decision making.

Assignment

• Data warehousing vendors? • Data warehousing case study found on the

internet.

Data Warehouse development Approaches

The Inmon Model: The EDW Approach• Emphasizes top-down development• Employing established database development

methodologies and toolsThe Kimball Model: The Data Mart Approach• Plan big, build small• Subject oriented or department oriented• Focus on the requests of a specific department.

Data Warehouse Structure(The Star Schema)

Successful Implementation of Data warehouse

• Establishment of service-level agreements and data-refresh requirements.

• Identification of data sources and their governance policies.

• Data quality planning & model designing.• ETL tool selection.• Relational database software and platform selection.• Data transport and data conversion.• Reconciliation process• End-user support

Issues in implementation of data warehouse

• Starting with the wrong sponsorship chain.• Setting expectation that you cannot meet and

frustrating executives at the moment of truth.• Engaging in politically native behavior.• Loading the warehouse with information just

because it is available.• Believing that data warehousing database design

is the same as transactional database design. Continue……..

• Choosing a data warehouse manager who is technology oriented rather than user oriented

• Focusing on traditional internal record-oriented data and ignoring the value of external data of text, image, and perhaps, sound and video.

• Delivering data with overlapping and confusing definitions.

• Believing promise of performance, capacity and scalability.

• Believing that your problem are over when the data warehouse is up and running.

Risks in Data Warehouse Projects• No mission or objective

• Quality of source data unknown

• Skills not in place

• Inadequate budget

• Lack of supporting software

• Source data not understood

• Weak sponsor

• Users not computer literate

• Geographically distributed environment

• Unrealistic user expectations• Architectural and design risks• Scope creep and changing

requirements• Vendors out of control• Multiple platforms• Key people leaving project• Loss of the sponsor• Too much new technology• Having to fix an operational

system• Team geography and

language culture

Massive Data Warehouse And Scalability

• Data warehouse needs scalability.• Good scalability means: queries and other

data access functions grow ideally with the size of warehouse.

• Specialized methods have been developed to create scalable data warehouse.

• Scalability is difficult in managing hundreds of terabytes.

Issues pertaining to scalability

• The amount of data in warehouse.• How quickly the warehouse is expected to

grow.• The number of concurrent users.• The complexity of user queries.

Real-Time Data warehousing

• Also knows as active data warehousing.• Process of loading & providing data via the data

warehouse.• Evolved from EDW (Enterprise Data Warehousing)

concept.• Allows information-based decision making at

finger tips.• Positively affect almost all aspects of customer

service, SCM, logistics.

Comparison between Traditional And Active Data Warehousing Environment

Traditional Data Warehouse Environment

• Strategic decisions only• Result sometimes hard to

measure• Moderate user concurrency • Highly restrictive reporting

used to confirm or check existing processes and patterns.

• Power users, knowledge workers, internal users.

Active Data Warehouse Environment

• Strategic and tactical decision• Result measured with

operations• High number of users accessing

simultaneously• Flexible ad hoc reporting, as

well as machine-assisted modeling to discover new hypotheses.

• Operational staffs, call centers, external users.

Data Warehouse Administration

• Due to huge size, data warehouse requires strong monitoring.

• A data warehouse administrator(DWA) should posses following features-

1. Should be familiar with high performance software, hardware, and networking tech.

2. Should familiar with decision making process.3. Significant to keep the existing requirement and

capabilities of data warehouse.4. Must posses excellent communication skills.

Data Warehouse Security issues

• Security and privacy of information is significant concern.

• Companies must create effective and flexible security procedures.

• Effective security in data warehouse focus on:1. Establishing effective corporate and security policies and

procedures.2. Implementing logical security procedures and techniques to

restrict access.3. Limiting physical access to the data center environment.4. Establishing an effective internal control review process with

an emphasis on security and privacy.