dwh life cycle

86
Data Data Warehouse Warehouse Life Cycle Life Cycle

Upload: yogita-sarang

Post on 13-Jan-2015

678 views

Category:

Technology


6 download

DESCRIPTION

DWH life cycle

TRANSCRIPT

Page 1: DWH life cycle

Data Data Warehouse Warehouse Life CycleLife Cycle

Page 2: DWH life cycle

Data Warehouse Data Warehouse DefinedDefined

““A data warehouse is a collection of A data warehouse is a collection of corporate information, derived corporate information, derived directly from operational systems directly from operational systems and some external data sources. Its and some external data sources. Its specific purpose is to support specific purpose is to support business decisions, not business business decisions, not business operationsoperations””

Page 3: DWH life cycle

Characteristics of a Characteristics of a DW DW

Subject-oriented DataSubject-oriented Data collects all data for a subject, from different collects all data for a subject, from different

sourcessources Read-only RequestsRead-only Requests

loaded during off-hours, read-only during day loaded during off-hours, read-only during day hourshours

Interactive Features, ad-hoc queryInteractive Features, ad-hoc query flexible design to handle spontaneous user flexible design to handle spontaneous user

queriesqueries Pre-aggregated dataPre-aggregated data

to improve runtime performanceto improve runtime performance Highly denormalized data structuresHighly denormalized data structures

Dimension tables with redundant columnsDimension tables with redundant columns

Page 4: DWH life cycle

Components of a Data Components of a Data WarehouseWarehouse

Source Systems

Data Staging Area

DWH Servers

End User Data Access

Storage Flat Files RDBMS

Processing

No User Query Services

Data Mart 1

Dimensional

Conforms to DW Bus

Data Mart 2

Query Tools

Report Writers

Mining Tools

Page 5: DWH life cycle

Data ModelingData Modeling

Page 6: DWH life cycle

Data ModelingData Modeling

WHAT IS A DATA MODEL?WHAT IS A DATA MODEL?

A data model is an abstraction of some aspect ofA data model is an abstraction of some aspect of

the real world (system). the real world (system).

WHY A DATA MODEL?WHY A DATA MODEL? Helps to visualise the businessHelps to visualise the business A model is a means of communication.A model is a means of communication. Models help elicit and document requirements. Models help elicit and document requirements. Models reduce the cost of change. Models reduce the cost of change. Model is the essence of DW architecture based Model is the essence of DW architecture based

on which DW will be implementedon which DW will be implemented

Page 7: DWH life cycle

What do we want to do with the What do we want to do with the data?data?

Model depends on what kind of data Model depends on what kind of data analysis we want to do:analysis we want to do:

Different Data Analysis TechniquesDifferent Data Analysis Techniques Query and reportingQuery and reporting

Display Query ResultsDisplay Query Results Multidimensional analysisMultidimensional analysis

Analyse data content by looking at it in Analyse data content by looking at it in different perspectivesdifferent perspectives

Data miningData mining discover patterns and clustering attributes in discover patterns and clustering attributes in

data data

Page 8: DWH life cycle

Impact of Data AnalysisImpact of Data Analysis Techniques on DM Techniques on DM

Query and reportingQuery and reporting Normalized data model Normalized data model Select associated data elementsSelect associated data elements summarize and group by categorysummarize and group by category present resultspresent results direct table scandirect table scan ER with normalized / denormalized ER with normalized / denormalized

appropriateappropriate

Page 9: DWH life cycle

Impact of Data Analysis Impact of Data Analysis Techniques on DMTechniques on DM

Multidimensional analysisMultidimensional analysis Fast and easy access to dataFast and easy access to data Any number of analysis dimensions in Any number of analysis dimensions in

any combinationsany combinations ER will mean many joinsER will mean many joins Dimensional model appropriateDimensional model appropriate

Page 10: DWH life cycle

Levels of modelingLevels of modeling

Conceptual modelingConceptual modeling Describe data requirements from a Describe data requirements from a

business point of view without technical business point of view without technical detailsdetails

Logical modelingLogical modeling Refine conceptual modelsRefine conceptual models Data structure oriented, platform Data structure oriented, platform

independentindependent Physical modelingPhysical modeling

Detailed specification of what is physically Detailed specification of what is physically implemented using specific technologyimplemented using specific technology

Page 11: DWH life cycle

Conceptual ModelConceptual Model

A conceptual model shows data A conceptual model shows data through business eyes.through business eyes.

All entities which have business All entities which have business meaning.meaning.

Important relationshipsImportant relationships Few significant attributes in the Few significant attributes in the

entities.entities. Few identifiers or candidate keys.Few identifiers or candidate keys.

Page 12: DWH life cycle

Logical ModelLogical Model

Replaces many-to-many relationships Replaces many-to-many relationships with associative entities.with associative entities.

Defines a full population of entity Defines a full population of entity attributes.attributes.

May use non-physical entities for May use non-physical entities for domains and sub-types.domains and sub-types.

Establishes entity identifiers.Establishes entity identifiers. Has no specifics for any RDBMS or Has no specifics for any RDBMS or

configuration.configuration.

Page 13: DWH life cycle

Physical ModelPhysical Model

A Physical data model may includeA Physical data model may include Referential IntegrityReferential Integrity IndexesIndexes ViewsViews Alternate keys and other constraintsAlternate keys and other constraints Tablespaces and physical storage Tablespaces and physical storage

objects.objects.

Page 14: DWH life cycle

STAGING AREASTAGING AREA YES ! (maybe multiple data models are YES ! (maybe multiple data models are

required)required) ODSODS

YES !YES ! DATAWAREHOUSE/DATAMARTDATAWAREHOUSE/DATAMART

YES! YES!

What needs to be modeled duringa data warehouse project

Page 15: DWH life cycle

Data Modeling - Data Modeling - TechniquesTechniques

Modeling techniquesModeling techniques

E-R ModelingE-R Modeling Dimensional ModelingDimensional Modeling

Page 16: DWH life cycle

Implementation and Implementation and modeling stylesmodeling styles

Modeling versus implementationModeling versus implementation Modeling: describe what should be Modeling: describe what should be

built to non-technical folksbuilt to non-technical folks Implementation: describe what is Implementation: describe what is

actually built to technical folksactually built to technical folks

Page 17: DWH life cycle

Relational modelingRelational modeling Use for implementationUse for implementation Difficult to understand by non-technical Difficult to understand by non-technical

folksfolks Dimensional modelingDimensional modeling

Use for modeling during analysis and Use for modeling during analysis and design phasesdesign phases

Can be implemented using other modeling Can be implemented using other modeling styles e.g. object-oriented, relationalstyles e.g. object-oriented, relational

Implementation and Implementation and modeling styles modeling styles

Page 18: DWH life cycle

Limitations of E-R Limitations of E-R ModelingModeling

Poor PerformancePoor Performance Tend to be very complex and Tend to be very complex and

difficult to navigate.difficult to navigate.

Page 19: DWH life cycle

Dimensional ModelingDimensional Modeling

Dimensional modeling uses three Dimensional modeling uses three basic concepts : measures, facts, basic concepts : measures, facts, dimensions.dimensions.

Is powerful in representing the Is powerful in representing the requirements of the business user in requirements of the business user in the context of database tables.the context of database tables.

Focuses on numeric data, such as Focuses on numeric data, such as values counts, weights, balances and values counts, weights, balances and occurences.occurences.

Page 20: DWH life cycle

Must identifyMust identify Business process to be supportedBusiness process to be supported Grain (level of detail)Grain (level of detail) DimensionsDimensions FactsFacts

Dimensional modelingDimensional modeling

Page 21: DWH life cycle

Conventions used in Conventions used in Dimensional modelingDimensional modeling

FactsFacts Measures(Variables)Measures(Variables) DimensionsDimensions

Dimension membersDimension members Dimension hierarchiesDimension hierarchies

Page 22: DWH life cycle

Facts Facts A fact is a collection of related data A fact is a collection of related data

items, consisting of measures and items, consisting of measures and context data.context data.

Each fact typically represents a Each fact typically represents a business item, a business transaction, business item, a business transaction, or an event that can be used in or an event that can be used in analyzing the business or business analyzing the business or business process.process.

Facts are measured, “continuously Facts are measured, “continuously valued”, rapidly changing information. valued”, rapidly changing information. Can be calculated and/or derived.Can be calculated and/or derived.

Page 23: DWH life cycle

Fact TableFact Table

A table that is used to store business A table that is used to store business information (measures) that can be information (measures) that can be used in mathematical equations.used in mathematical equations. QuantitiesQuantities PercentagesPercentages PricesPrices

Page 24: DWH life cycle

DimensionsDimensions

A dimension is a collection of members A dimension is a collection of members or units of the same type of views.or units of the same type of views.

Dimensions determine the contextual Dimensions determine the contextual background for the facts.background for the facts.

Dimensions represent the way business Dimensions represent the way business people talk about the data resulting from people talk about the data resulting from a business process, e.g., who, what, a business process, e.g., who, what, when, where, why, howwhen, where, why, how

Page 25: DWH life cycle

Dimension TableDimension Table

Table used to store qualitative data Table used to store qualitative data about fact recordsabout fact records WhoWho WhatWhat WhenWhen WhereWhere WhyWhy

Page 26: DWH life cycle

Dimension data should Dimension data should bebe

verbose, descriptiveverbose, descriptive completecomplete no misspellings, impossible valuesno misspellings, impossible values indexed indexed equally available equally available documented ( metadata to explain documented ( metadata to explain

origin, interpretation of each origin, interpretation of each attribute)attribute)

Page 27: DWH life cycle

Dimensional modelDimensional model

Visualise a dimensional model as a CUBE Visualise a dimensional model as a CUBE (hypercube because dimensions can be (hypercube because dimensions can be more than more than

3 in number)3 in number) Operations for OLAP Operations for OLAP

Drill DownDrill Down : :Higher level of detailHigher level of detailRoll UpRoll Up: : summarized level of data summarized level of data(The navigation path is determined by (The navigation path is determined by hierarchies within dimensions.)hierarchies within dimensions.)SliceSlice:: cuts through the cube.Users can focus cuts through the cube.Users can focus on specific perspectiveson specific perspectives DiceDice:: rotates the cube to another perspective rotates the cube to another perspective (change the dimension)(change the dimension)

Page 28: DWH life cycle

Drill down …. Roll upDrill down …. Roll up

Page 29: DWH life cycle

Slice and DiceSlice and Dice

Page 30: DWH life cycle

DimensionsDimensions Collection of members or units of the same Collection of members or units of the same

type of views.type of views. determine the contextual background for the determine the contextual background for the

facts.facts. the parameters over which we want to the parameters over which we want to

perform OLAP perform OLAP (eg.(eg. Time,Time, Location/region,Location/region, Customers)Customers) MemberMember is a distinct name to determine data item’s is a distinct name to determine data item’s

position (eg. Time - Month, quarter)position (eg. Time - Month, quarter) HierarchyHierarchy arrange members into hierarchies or arrange members into hierarchies or

levelslevels

Page 31: DWH life cycle

HierarchiesHierarchies

Allow for the ‘rollup’ of data to more Allow for the ‘rollup’ of data to more summarized levels.summarized levels. TimeTime

dayday monthmonth quarterquarter yearyear

Page 32: DWH life cycle

HierarchiesHierarchies

Page 33: DWH life cycle

AggregatesAggregates

Aggregate Tables are pre-stored Aggregate Tables are pre-stored summarized tables… created at a summarized tables… created at a higher level of granularity across higher level of granularity across any or all of the dimensions.any or all of the dimensions.

If the existing granularity is Day If the existing granularity is Day wise sales, then creating a separate wise sales, then creating a separate month wise sales table is an month wise sales table is an example of Aggregate Table. example of Aggregate Table.

Page 34: DWH life cycle

AggregatesAggregates

The use of such aggregates is the The use of such aggregates is the single most effective tool the data single most effective tool the data warehouse designer has to warehouse designer has to improve query performance.improve query performance.

Usage of Aggregates can increase Usage of Aggregates can increase the performance of Queries by the performance of Queries by several times. several times.

Page 35: DWH life cycle

MeasuresMeasures A measure is a numeric attribute of a fact, A measure is a numeric attribute of a fact,

representing the performance or behaviour representing the performance or behaviour of the business relative to dimensions. of the business relative to dimensions.

The actual numbers are called as variables.The actual numbers are called as variables.eg. sales in money, sales volume, quantity supplied, supply eg. sales in money, sales volume, quantity supplied, supply cost, transaction amountcost, transaction amount

A measure is determined by combinations A measure is determined by combinations of the members of the dimensions and is of the members of the dimensions and is located on facts.located on facts.

Page 36: DWH life cycle

The Cube The Cube

Page 37: DWH life cycle

Types of FactsTypes of Facts

AdditiveAdditive Able to add the facts along all the Able to add the facts along all the

dimensionsdimensions Discrete numerical measures eg. Retail Discrete numerical measures eg. Retail

sales in $sales in $ Semi AdditiveSemi Additive

Snapshot, taken at a point in timeSnapshot, taken at a point in time Measures of IntensityMeasures of Intensity Not additive along time dimension eg. Not additive along time dimension eg.

Account balance, Inventory balanceAccount balance, Inventory balance Added and divided by number of time Added and divided by number of time

period to get a time-averageperiod to get a time-average

Page 38: DWH life cycle

Types of FactsTypes of Facts

Non AdditiveNon Additive Numeric measures that cannot be added Numeric measures that cannot be added

across any dimensionsacross any dimensions Intensity measure averaged across all Intensity measure averaged across all

dimensions eg. Room temperaturedimensions eg. Room temperature Textual facts - AVOID THEMTextual facts - AVOID THEM

Page 39: DWH life cycle

StarStar Single fact table surrounded by Single fact table surrounded by

denormalized dimension tablesdenormalized dimension tables The fact table primary key is the composite The fact table primary key is the composite

of the foreign keys (primary keys of of the foreign keys (primary keys of dimension tables)dimension tables)

Fact table contains transaction type Fact table contains transaction type information.information.

Many star schemas in a data martMany star schemas in a data mart Easily understood by end users, more disk Easily understood by end users, more disk

storage requiredstorage required

Common structures for Common structures for Data Marts :Data Marts :Denormalize!Denormalize!

Page 40: DWH life cycle

Example of Star Example of Star SchemaSchema

Page 41: DWH life cycle

SnowflakeSnowflake Single fact table surrounded by normalized Single fact table surrounded by normalized

dimension tablesdimension tables Normalizes dimension table to save data storage Normalizes dimension table to save data storage

space.space. When dimensions become very very largeWhen dimensions become very very large Less intuitive, slower performance due to joinsLess intuitive, slower performance due to joins

May want to use both approaches, May want to use both approaches, especially if supporting multiple end-user especially if supporting multiple end-user tools.tools.

Common structures for Common structures for Data Marts:Data Marts:

Denormalize!Denormalize!

Page 42: DWH life cycle

Example of Snow flake Example of Snow flake schemaschema

Page 43: DWH life cycle

Snowflake - Snowflake - DisadvantagesDisadvantages

Normalization of dimension makes it Normalization of dimension makes it difficult for user to understanddifficult for user to understand

Decreases the query performance Decreases the query performance because it involves more joinsbecause it involves more joins

Dimension tables are normally Dimension tables are normally smaller than fact tables - space may smaller than fact tables - space may not be a major issue to warrant not be a major issue to warrant snowflakingsnowflaking

Page 44: DWH life cycle

Keys …Keys …

Primary KeysPrimary Keys uniquely identify a recorduniquely identify a record

Foreign KeysForeign Keys primary key of another table referred primary key of another table referred

herehere Surrogate KeysSurrogate Keys

system-generated key for dimensionssystem-generated key for dimensions key on its own has no meaning key on its own has no meaning integer key, less spaceinteger key, less space

Page 45: DWH life cycle

More Keys …More Keys … Smart KeysSmart Keys

primary key out of various attributes of primary key out of various attributes of dimensiondimension

AVOID THEM!AVOID THEM! Join to Fact table should be on single Join to Fact table should be on single

surrogate keysurrogate key Production KeysProduction Keys

DO NOT USE Production defined DO NOT USE Production defined attributes attributes

Business may reuse/change them - DW Business may reuse/change them - DW cannot!cannot!

Page 46: DWH life cycle

Basic Dimensional Modeling Basic Dimensional Modeling TechniquesTechniques

Slowing changing DimensionsSlowing changing Dimensions Rapidly changing Small DimensionsRapidly changing Small Dimensions Large DimensionsLarge Dimensions Rapidly changing Large DimensionsRapidly changing Large Dimensions Degenerate DimensionsDegenerate Dimensions Junk DimensionsJunk Dimensions

Page 47: DWH life cycle

Slowly Changing Slowly Changing DimensionsDimensions

A dimension is considered a A dimension is considered a Slowly Slowly Changing DimensionChanging Dimension when its when its attributes remain attributes remain almostalmost constant constant over time, requiring relatively minor over time, requiring relatively minor alterations to represent the evolved alterations to represent the evolved state.state.

Page 48: DWH life cycle

The Time Dimension The Time Dimension Time_keyday_of_weekday_number_in_monthday_number_overallweek_number_in_yearmonthquarterfiscal_periodholiday_flagweekday_flaglast_day_in_month_flagseasonevent

Page 49: DWH life cycle

Time DimensionTime Dimension

An exclusive Time dimension is An exclusive Time dimension is required because the SQL date required because the SQL date semantics and functions cannot semantics and functions cannot generate several important attributes generate several important attributes required for analytical purposes.required for analytical purposes.

Attributes like weekdays, weekends, Attributes like weekdays, weekends, fiscal period, holidays, season cannot fiscal period, holidays, season cannot be generated by SQL statements.be generated by SQL statements.

Page 50: DWH life cycle

Time DimensionTime Dimension

Moreover SQL date stamps occupy Moreover SQL date stamps occupy more space largely increasing the more space largely increasing the size of the fact table.size of the fact table.

Joins on such SQL generated date-Joins on such SQL generated date-stamps are costly decreasing the stamps are costly decreasing the query speed significantly.query speed significantly.

Page 51: DWH life cycle

Time DimensionTime Dimension

The Day of week(Monday, ...) is The Day of week(Monday, ...) is useful to create reports comparing useful to create reports comparing for ex. Monday sales to Friday for ex. Monday sales to Friday sales. sales.

The Day number in month is The Day number in month is useful for comparing measures for useful for comparing measures for the same day in each month.the same day in each month.

The last day in month flag is useful The last day in month flag is useful for performing payday analysis.for performing payday analysis.

Page 52: DWH life cycle

Time DimensionTime Dimension

The holiday flag and season The holiday flag and season attributes are useful for holiday attributes are useful for holiday VS non-holiday analysis and VS non-holiday analysis and season business analysis.season business analysis.

Event attribute is needed to Event attribute is needed to record special days like strike record special days like strike days, etc..days, etc..

Page 53: DWH life cycle

ETVL Overview ETVL Overview

Page 54: DWH life cycle

Introduction

SourceSystem 1

SourceSystem 2

SourceSystem 3

Staging Area Data warehouse

ETVL

ETVL

Extraction, Transformation, Validation, Load

Page 55: DWH life cycle

ExtractionExtraction

Source Systems (Multiple Source Source Systems (Multiple Source Systems)Systems) Flat files, Excel, Legacy Systems, RDBMS etc.Flat files, Excel, Legacy Systems, RDBMS etc.

Frequency of ExtractionFrequency of Extraction Staging Area (If any? How many?)Staging Area (If any? How many?) Most Transformations from Source to Most Transformations from Source to

StagingStaging Cleansing and Data Quality Cleansing and Data Quality

Data integrity, De-duplication, completeness, Data integrity, De-duplication, completeness, correctnesscorrectness

Page 56: DWH life cycle

TransformationTransformation

Usage of toolsUsage of tools Reusability of TransformationsReusability of Transformations Reusability of MappingsReusability of Mappings

Different toolsDifferent tools InformaticaInformatica Warehouse BuilderWarehouse Builder ETIETI SagentSagent PL/SQL scriptsPL/SQL scripts

Page 57: DWH life cycle

LoadingLoading

Loading FrequencyLoading Frequency Optimized LoadingOptimized Loading

IndexingIndexing PartitioningPartitioning

AggregationAggregation SumSum AverageAverage MaxMax

Update StrategyUpdate Strategy Error HandlingError Handling

Page 58: DWH life cycle

SynopsisSynopsis

- Flat files, Excel, Legacy Systems, - Flat files, Excel, Legacy Systems, RDBMS etc.RDBMS etc.

Implement Business RulesImplement Business Rules ODBC ConnectivityODBC Connectivity Scheduling the ETVLScheduling the ETVL Frequency of ExtractionFrequency of Extraction Staging AreaStaging Area Most Transformations from Source to Most Transformations from Source to

StagingStaging

Page 59: DWH life cycle

SynopsisSynopsis

Cleansing and Data Quality Cleansing and Data Quality Data integrity, De-duplication, Data integrity, De-duplication,

completeness, correctnesscompleteness, correctness Rejected RecordsRejected Records Exception Handling and Error LogException Handling and Error Log Optimized LoadingOptimized Loading Re-usabilityRe-usability Aggregation of dataAggregation of data Update StrategyUpdate Strategy

Page 60: DWH life cycle

STAGING AREA - Some STAGING AREA - Some Clarity Clarity

Staging Area Staging Area optionaloptional to cleanse the source datato cleanse the source data Accepts data from different sources Accepts data from different sources Data model is required at staging areaData model is required at staging area Multiple data models may be required Multiple data models may be required

for parking different sources and for for parking different sources and for transformed data to be pushed out to transformed data to be pushed out to warehousewarehouse

Page 61: DWH life cycle

ODS - Some ClarityODS - Some Clarity

Operational Data StoreOperational Data Store OptionalOptional Granular, detailed level dataGranular, detailed level data May feed warehouse (eg when May feed warehouse (eg when

warehouse is aggregated)warehouse is aggregated) Usually a relational modelUsually a relational model May keep data for a smaller time period May keep data for a smaller time period

than warehousethan warehouse

Page 62: DWH life cycle

A look at different DW A look at different DW architecturesarchitectures

Operational Data

External data

Warehouse Manager

L

O

A

D

M

A

N

A

G

E

R

Q

U

E

R

Y

M

A

N

A

G

E

R

Detailed Information

Summary information

Meta Data OLAP

Page 63: DWH life cycle

Data Warehouse Data Warehouse Architecture - 2Architecture - 2

Page 64: DWH life cycle

Data Warehouse Data Warehouse Architecture - 3Architecture - 3

Page 65: DWH life cycle

Data Warehouse Data Warehouse Architecture - 4Architecture - 4

Page 66: DWH life cycle

DW ArchitectureDW Architecture

Architecture Choices depend onArchitecture Choices depend on Current infrastructureCurrent infrastructure Business environmentBusiness environment Desired management and control Desired management and control

structurestructure resourcesresources commitment ….. commitment …..

Data Warehouse/data martData Warehouse/data mart

Page 67: DWH life cycle

DW Architecture DW Architecture

Architecture Choices determineArchitecture Choices determine Where will DW reside?Where will DW reside?

Centrally / locally / distributed Centrally / locally / distributed Where will it be managed from?Where will it be managed from?

Centrally / independentlyCentrally / independently

3 choices3 choices GlobalGlobal IndependentIndependent InterconnectedInterconnected

(or) a combination of these three(or) a combination of these three

Page 68: DWH life cycle

DW Architecture DW Architecture

Global Architecture Global Architecture related to scope of data access and related to scope of data access and

storage storage does not mean centralizeddoes not mean centralized can be physically centralized or can be physically centralized or

distributeddistributed enterprise view of dataenterprise view of data time-consuming & costly to implementtime-consuming & costly to implement

Page 69: DWH life cycle

Global ArchitectureGlobal Architecture

Page 70: DWH life cycle

DW Architecture DW Architecture

Independent Architecture Independent Architecture stand-alonestand-alone controlled by a departmentcontrolled by a department minimal integrationminimal integration no global viewno global view very fast to implementvery fast to implement

Page 71: DWH life cycle

DW Architecture DW Architecture

Interconnected Architecture Interconnected Architecture distributeddistributed integrated and interconnected integrated and interconnected gives a global view of enterprisegives a global view of enterprise more complexitymore complexity

who manages / controls data who manages / controls data another tier in architecture to share common another tier in architecture to share common

data between multiple data martsdata between multiple data marts have a data sharing schema across data have a data sharing schema across data

martsmarts

Page 72: DWH life cycle

Independent Independent & &

Interconnected ArchitectureInterconnected Architecture

Page 73: DWH life cycle

Types of Data WarehouseTypes of Data Warehouse

Enterprise Data Warehouse Enterprise Data Warehouse Data MartData Mart

EnterpriseData Warehouse

Datamart Datamart Datamart

Page 74: DWH life cycle

Enterprise data Enterprise data warehousewarehouse

Contains data drawn from multiple Contains data drawn from multiple operational systemsoperational systems

Supports time- series and trend Supports time- series and trend analysis across different business areasanalysis across different business areas

Can be used as a transient storage area Can be used as a transient storage area to clean all data and ensure consistencyto clean all data and ensure consistency

Can be used to populate data martsCan be used to populate data marts Can be used for everyday and strategic Can be used for everyday and strategic

decision makingdecision making

Page 75: DWH life cycle

Data MartData Mart

Logical subset of enterprise data Logical subset of enterprise data warehousewarehouse

Organized around a single business Organized around a single business process process

Based on granular data Based on granular data May or may not contain aggregatesMay or may not contain aggregates Object of analytical processing by the Object of analytical processing by the

end user. end user. Less expensive and much smaller Less expensive and much smaller

than a full blown corporate data than a full blown corporate data warehouse.warehouse.

Page 76: DWH life cycle

Distributed and Centralized Distributed and Centralized Data warehousesData warehouses

DW sitting on a monolithic DW sitting on a monolithic machine - machine - unrealisticunrealistic

Separate machines, different OS, Separate machines, different OS, different DB systems - different DB systems - realityreality

SolutionSolution Share a uniform architecture to Share a uniform architecture to

allow them to be fused coherentlyallow them to be fused coherently

Page 77: DWH life cycle

Classical ArchitecturesClassical Architectures

Physical data warehouse (physical)Physical data warehouse (physical) Data warehouse --> data martsData warehouse --> data marts Data marts --> data warehouseData marts --> data warehouse Parallel data warehouse and data marts Parallel data warehouse and data marts

Page 78: DWH life cycle

Physical data warehouse: Physical data warehouse: Data warehouse --> data Data warehouse --> data

martsmarts

•SOURCE DATA

•External•Data

•Operational Data

•Staging Area

•Data Warehouse •Data Marts

•Physical Data Warehouse:•Data Warehouse --> Data Marts

Page 79: DWH life cycle

Physical data warehouse:Physical data warehouse:Data marts --> data Data marts --> data

warehousewarehouse

SOURCE DATA

ExternalData

Operational Data

Staging Area

Data WarehouseData Marts

Physical Data Warehouse:Data Marts --> Data Warehouse

Page 80: DWH life cycle

Physical Data Warehouse:Physical Data Warehouse:Parallel Data Warehouse Parallel Data Warehouse

and Data Martand Data Mart

SOURCE DATA

ExternalData

Operational DataStaging Area

Data Warehouse

Data Marts

Physical Data Warehouse:Parallel Data Warehouse & Data Marts

Page 81: DWH life cycle

DW Implementation DW Implementation Approaches Approaches

Top DownTop Down Bottom-upBottom-up Combination of bothCombination of both Choices depend on:Choices depend on:

current infrastructurecurrent infrastructure resourcesresources architecturearchitecture ROIROI Implementation speedImplementation speed

Page 82: DWH life cycle

Top Down Top Down ImplementationImplementation

Page 83: DWH life cycle

Bottom Up Bottom Up ImplementationImplementation

Page 84: DWH life cycle

DW Implementation DW Implementation ApproachesApproaches

Top DownTop Down More planning and More planning and

design initiallydesign initially Involve people from Involve people from

different work-groups, different work-groups, departmentsdepartments

Data marts may be Data marts may be built later from Global built later from Global DWDW

Overall data model to Overall data model to be decided up-frontbe decided up-front

Bottom UpBottom Up Can plan initially Can plan initially

without waiting for without waiting for global infrastructureglobal infrastructure

built incrementally built incrementally can be built before can be built before

or in parallel with or in parallel with Global DWGlobal DW

Less complexity in Less complexity in designdesign

Page 85: DWH life cycle

DW Implementation DW Implementation ApproachesApproaches

Top DownTop Down Consistent data definition Consistent data definition

and enforcement of and enforcement of business rules across business rules across enterpriseenterprise

High cost, lengthy High cost, lengthy process, time consumingprocess, time consuming

Works well when there is Works well when there is centralized IS centralized IS department responsible department responsible for all H/W and resourcesfor all H/W and resources

Bottom UpBottom Up Data redundancy Data redundancy

and inconsistency and inconsistency between data marts between data marts may occurmay occur

Integration requires Integration requires great planninggreat planning

Less cost of H/W Less cost of H/W and other resourcesand other resources

Faster pay-backFaster pay-back

Page 86: DWH life cycle