data warehouse databases

28
Data Warehouse Databases

Upload: giri-saranu

Post on 16-Nov-2014

130 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Warehouse Databases

Data Warehouse Databases

Page 2: Data Warehouse Databases

ObjectivesObjectives

►At the end of this lesson, you will know At the end of this lesson, you will know :: Characteristics of warehouse databasesCharacteristics of warehouse databases Types of warehouse databasesTypes of warehouse databases The strengths and limitations of each typeThe strengths and limitations of each type Examples of warehouse databasesExamples of warehouse databases Recommendations on databases for data Recommendations on databases for data

warehousewarehouse

Page 3: Data Warehouse Databases

Databases for Data Warehouse - Databases for Data Warehouse - CharacteristicsCharacteristics

► Designed for analytical, DSS tasksDesigned for analytical, DSS tasks► Typically, historic, non-volatile dataTypically, historic, non-volatile data► Subject-oriented, integrated, detail and Subject-oriented, integrated, detail and

summary data. Data marts often include summary data. Data marts often include summarized datasummarized data

► Small number of users, complex queriesSmall number of users, complex queries► Large tables, frequent multi-table joins. Large tables, frequent multi-table joins.

Requirement for high, sustained data flow from Requirement for high, sustained data flow from large fact tableslarge fact tables

► Updates are additive with periodic data refreshUpdates are additive with periodic data refresh

Page 4: Data Warehouse Databases

Critical featuresCritical features► Scalability and PortabilityScalability and Portability

Scalable to multiple concurrent users and terabytes Scalable to multiple concurrent users and terabytes of dataof data

Support SMP, MPP, NUMA and ClustersSupport SMP, MPP, NUMA and Clusters► Query PerformanceQuery Performance

High performance for transactions and queriesHigh performance for transactions and queries Extensions to SQL to enhance performanceExtensions to SQL to enhance performance DSS specific indexing and join methodsDSS specific indexing and join methods

► ParallelismParallelism Should perform operations in parallel utilizing Should perform operations in parallel utilizing

multiple processors on the machinemultiple processors on the machine Parallel everything - query, index, loadParallel everything - query, index, load

Page 5: Data Warehouse Databases

Critical FeaturesCritical Features

► Data ManagementData Management Support data partitioning methods - Support data partitioning methods -

range, hashrange, hash High throughput for bulk loads and selectsHigh throughput for bulk loads and selects

► AdministrationAdministration Easy to use GUI for administration toolEasy to use GUI for administration tool Availability of cost-based optimizationAvailability of cost-based optimization

Page 6: Data Warehouse Databases

Warehouse Databases - TypesWarehouse Databases - Types

► RelationalRelational Central Warehouse is usually relational because Central Warehouse is usually relational because

of potentially large size of data warehouseof potentially large size of data warehouse► MultidimensionalMultidimensional

faster response to analytical queries and OLAP faster response to analytical queries and OLAP computations but they have size limitationscomputations but they have size limitations

► Hybrid ArchitectureHybrid Architecture Uses relational component to support large Uses relational component to support large

databases and multidimensional component for databases and multidimensional component for fast response to analytical queriesfast response to analytical queries

Page 7: Data Warehouse Databases

Relational Database for Relational Database for Warehouse StrengthsWarehouse Strengths

►Scalable to multi-terabytes, portable Scalable to multi-terabytes, portable among many platformsamong many platforms

►High speed query processing using High speed query processing using SMP, MPP, clustered multiprocessorsSMP, MPP, clustered multiprocessors

►Detailed and aggregated data stored Detailed and aggregated data stored in same databasein same database

Page 8: Data Warehouse Databases

Relational Database for Warehouse Relational Database for Warehouse StrengthsStrengths

► Enhanced for data warehouse - data Enhanced for data warehouse - data replication, parallelization, query replication, parallelization, query optimization, bit-mapped indexes, cost-optimization, bit-mapped indexes, cost-based optimizer, SQL extensions for OLAPbased optimizer, SQL extensions for OLAP

► Supports open system standards like SQL, Supports open system standards like SQL, ODBCODBC

► Supported by large number of third party Supported by large number of third party vendorsvendors

Page 9: Data Warehouse Databases

Relational Database for Warehouse Relational Database for Warehouse LimitationsLimitations

► Originally optimized for OLTPOriginally optimized for OLTP► Slower than MDB databases for OLAP Slower than MDB databases for OLAP

calculations and ad hoc analysis of data in calculations and ad hoc analysis of data in multiple dimensionsmultiple dimensions

► Limitations of SQL - cannot perform Limitations of SQL - cannot perform what ifswhat ifs rankingsrankings cross dimensional calculations cross dimensional calculations variances variances multi-level hierarchical rollups multi-level hierarchical rollups

Page 10: Data Warehouse Databases

Relational Database for Warehouse Relational Database for Warehouse LimitationsLimitations

► Sophisticated techniques required to Sophisticated techniques required to overcome SQL limitationsovercome SQL limitations

► Star and Snowflake schemas require Star and Snowflake schemas require complex administrationcomplex administration

Page 11: Data Warehouse Databases

RDBMS for Warehouse - RDBMS for Warehouse - SelectionSelection

► Scalability to support VLDBs and large Scalability to support VLDBs and large number of concurrent users performing number of concurrent users performing complex analysescomplex analyses

► Support for advanced parallel processing Support for advanced parallel processing and partitioning techniquesand partitioning techniques

► Integration with ETL toolsIntegration with ETL tools► Integration with MDDBsIntegration with MDDBs

Page 12: Data Warehouse Databases

RDBMS for Warehouse - RDBMS for Warehouse - SelectionSelection

► Integration with data access and analysis Integration with data access and analysis toolstools

► Integration with local and central metadataIntegration with local and central metadata► Star schema and multidimensional Star schema and multidimensional

extensions to SQL to support OLAP extensions to SQL to support OLAP calculations, variances etc.calculations, variances etc.

► Portability, security, data integrity, Portability, security, data integrity, backup/restorebackup/restore

Page 13: Data Warehouse Databases

Very Large Databases- VLDBsVery Large Databases- VLDBs

► A Data Warehouse is 5 to 50 times the size A Data Warehouse is 5 to 50 times the size of an OLTP database because ofof an OLTP database because of Historical ContentHistorical Content Summarization and AggregationsSummarization and Aggregations Special requirements of MDDBs, Data Special requirements of MDDBs, Data

Movement and Cleansing tools etcMovement and Cleansing tools etc► Average Data Warehouse grows 6-7 times in Average Data Warehouse grows 6-7 times in

18 months. 18 months.

Page 14: Data Warehouse Databases

Partitioning and ParallelismPartitioning and Parallelism

► ParallelismParallelism DBMS should carry out backups, loads etc in DBMS should carry out backups, loads etc in

parallelparallel Queries with UNIONS, GROUPS etc can be Queries with UNIONS, GROUPS etc can be

processed in parallelprocessed in parallel► PartitioningPartitioning

Breakup tables into chunks for backups, loadsBreakup tables into chunks for backups, loads Crashes normally affect only one partitionCrashes normally affect only one partition

Key approaches for VLDBsKey approaches for VLDBs

Page 15: Data Warehouse Databases

Types of PartitioningTypes of Partitioning► Range PartitioningRange Partitioning

Based on a range of valuesBased on a range of values► Round Robin PartitioningRound Robin Partitioning

Data split in the order of insertsData split in the order of inserts► Hash PartitioningHash Partitioning

Data split using a hash keyData split using a hash key► Expression PartitioningExpression Partitioning

Data split using a where clause like Data split using a where clause like expressionexpression

Page 16: Data Warehouse Databases

Query PerformanceQuery Performance► Data Warehouse DBMS should support Data Warehouse DBMS should support

Advance Indexing TechniquesAdvance Indexing Techniques Bitmapped IndexesBitmapped Indexes Star JoinsStar Joins Hash JoinsHash Joins

► Data Warehouse DBMS should also derive Data Warehouse DBMS should also derive maximum advantage from parallelismmaximum advantage from parallelism

Page 17: Data Warehouse Databases

Examples of RDBMS for Examples of RDBMS for WarehouseWarehouse

► Oracle 8.x Oracle 8.x ► IBM DB2 UDBIBM DB2 UDB► Informix Dynamic Server w/ Extended Informix Dynamic Server w/ Extended

Parallel Option , Universal Data OptionParallel Option , Universal Data Option► Informix Red Brick WarehouseInformix Red Brick Warehouse► Sybase Adaptive Server, IQSybase Adaptive Server, IQ► NCR TeradataNCR Teradata

Page 18: Data Warehouse Databases

Multidimensional DatabasesMultidimensional Databases

Source Databases

Extract Transform

Load

Metadata Repository

Data Modeling

Tool

Warehouse Admin Tool

RDBMS

Local Metadata

Local Metadata

Data Access Tools

Data Access Tools

Page 19: Data Warehouse Databases

Components of MDDBComponents of MDDB

► Source data - may be accessed from an Source data - may be accessed from an RDBMS or directly from source databasesRDBMS or directly from source databases

► Multidimensional database server - contains Multidimensional database server - contains base data, pre-calculated results stored in base data, pre-calculated results stored in multidimensional array, indexing structure multidimensional array, indexing structure to access the datato access the data

► Metadata - local data definitions and Metadata - local data definitions and semantics in business termssemantics in business terms

► Multidimensional viewer - end-user data Multidimensional viewer - end-user data access toolaccess tool

Page 20: Data Warehouse Databases

MDDB for Warehouse - MDDB for Warehouse - StrengthsStrengths

► High performance, sophisticated High performance, sophisticated multidimensional calculationsmultidimensional calculations Aggregations Aggregations matrix calculationsmatrix calculations Proprietary extended features - row-level Proprietary extended features - row-level

calculationscalculations

Page 21: Data Warehouse Databases

MDDB for Warehouse - StrengthsMDDB for Warehouse - Strengths► Optimized for OLAP. Appropriate for DSS :Optimized for OLAP. Appropriate for DSS :

Support for complex, cross-dimensional Support for complex, cross-dimensional calculationscalculations

OLAP-aware functionsOLAP-aware functions Drill-down for Drill-down for

►iterative queries iterative queries ►trend analysis trend analysis ►what-if analysis what-if analysis ►Rapid pivoting from dimension to dimensionRapid pivoting from dimension to dimension

► Simple database administrationSimple database administration► Efficient data storage: less space than RDBMSEfficient data storage: less space than RDBMS

Page 22: Data Warehouse Databases

MDDB for Warehouse - LimitationsMDDB for Warehouse - Limitations► Requires proprietary database solutionRequires proprietary database solution► Limitations to size of raw data. Size limit is due to Limitations to size of raw data. Size limit is due to

time required to pre-compute large time required to pre-compute large multidimensional arrays. multidimensional arrays.

► Many MDBs cannot load data incrementallyMany MDBs cannot load data incrementally► Some MDBs require pre-computation of all dataSome MDBs require pre-computation of all data► Static, pre-computed dimensions and calculationsStatic, pre-computed dimensions and calculations► Performance degradation if database size Performance degradation if database size

increasesincreases► Lack of standard MDB model or access methodLack of standard MDB model or access method

Page 23: Data Warehouse Databases

Examples of MDDBsExamples of MDDBs

► Essbase - Arbor SoftwareEssbase - Arbor Software► SAS Multidimensional Database ServerSAS Multidimensional Database Server► Commander Decision - ComshareCommander Decision - Comshare► Acumate ES - Kenan SystemsAcumate ES - Kenan Systems► Pilot Decision Support Suite - CognizantPilot Decision Support Suite - Cognizant► FOCUS/Fusion - Information BuildersFOCUS/Fusion - Information Builders

Page 24: Data Warehouse Databases

Hybrid ArchitectureHybrid Architecture

► Combination of RDBMS and MDB controlled Combination of RDBMS and MDB controlled by OLAP Serverby OLAP Server RDBMS used for detailed data stored in RDBMS used for detailed data stored in

large databaseslarge databases MDB used for fast, read/write OLAP analysis MDB used for fast, read/write OLAP analysis

and calculationsand calculations OLAP Server routes queries to either OLAP Server routes queries to either

RDBMS or MDBRDBMS or MDB Result set from RDBMS may be processed Result set from RDBMS may be processed

on-the-fly in Serveron-the-fly in Server

Page 25: Data Warehouse Databases

Hybrid OLAP ArchitectureHybrid OLAP Architecture

Source Databases

Extract Transform

Load

Metadata Repository

Data Modeling

Tool

Warehouse Admin Tool

RDBMS

MDB

OLAP Server

Data Access Tools

Page 26: Data Warehouse Databases

Examples of Hybrid ProductsExamples of Hybrid Products► Microsoft OLAP Server (Plato)Microsoft OLAP Server (Plato)► Oracle Express with ROLAP OptionOracle Express with ROLAP Option► Holos from Seagate SoftwareHolos from Seagate Software► GentiaDB from Gentia SoftwareGentiaDB from Gentia Software► Whitelight OLAP from Sybase (reseller)Whitelight OLAP from Sybase (reseller)

Page 27: Data Warehouse Databases

RecommendationsRecommendations

► Use RDBMS, MDB, and hybrid databases to meet Use RDBMS, MDB, and hybrid databases to meet specialized requirements of groups of end usersspecialized requirements of groups of end users

► Use RDBMS and ROLAP tools to provide Use RDBMS and ROLAP tools to provide multidimensional views of large target databasesmultidimensional views of large target databases

► Use RDBMS features like parallelization, bit-mapped Use RDBMS features like parallelization, bit-mapped indexes for acceptable performanceindexes for acceptable performance

► Use MDB for high-performance analysis of moderate size Use MDB for high-performance analysis of moderate size databasesdatabases

► Use hybrid approach for applications that require access Use hybrid approach for applications that require access to detail data and fast OLAP computationsto detail data and fast OLAP computations

Databases For Data WarehousesDatabases For Data Warehouses

Page 28: Data Warehouse Databases

Questions