unit – ii introduction to data warehousing by mrs. chetana
TRANSCRIPT
![Page 1: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/1.jpg)
UNIT – IIUNIT – IIIntroduction to Introduction to
Data WarehousingData Warehousing
By Mrs. Chetana
![Page 2: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/2.jpg)
UNIT - II
• Data Warehouse and OLAP Technology for Data Mining Data Warehouse, Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation
• Data Cube Computation:
Efficient Methods for simple Data Cube Computation (Full Cube, Iceberg Cube, Closed Cube and Shell Cube), Discovery Driven exploration of data cubes
By Mrs. Chetana
![Page 3: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/3.jpg)
Why Data Warehouse?Why Data Warehouse?
“Necessity is the mother of invention”
An operational data store (Database) stores data for a specific application.
It feeds the data warehouse a stream of desired raw data.Data store is generally subject oriented, volatile, current
commonly focused on customers, products, orders, policies, claims, etc…
By Mrs. Chetana
![Page 4: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/4.jpg)
Scenario 1Scenario 1
ABC Pvt Ltd is a company with branches at Mumbai, Hyderabad, Chennai and Bangalore.
The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
By Mrs. Chetana
![Page 5: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/5.jpg)
Scenario 1 : ABC Pvt Ltd.Scenario 1 : ABC Pvt Ltd.
Mumbai
Hyderabad
Chennai
Banglore
SalesManager
Sales per item type per branchfor first quarter.
By Mrs. Chetana
![Page 6: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/6.jpg)
Solution 1:ABC Pvt LtdSolution 1:ABC Pvt Ltd..
Extract sales information from each database.Store the information in a common repository
at a single site.
By Mrs. Chetana
![Page 7: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/7.jpg)
Solution 1:ABC Pvt Ltd.Solution 1:ABC Pvt Ltd.
Mumbai
Delhi
Chennai
Bangalore
DataWarehouse
SalesManager
Query &Analysis tools
Report
By Mrs. Chetana
![Page 8: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/8.jpg)
Scenario 2Scenario 2
One Stop Shopping Super Market has hugeoperational database.
Whenever Executives wants some report the OLTP(database)system becomes slow and data entry operators have to wait for some time.
By Mrs. Chetana
![Page 9: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/9.jpg)
Scenario 2 : One Stop ShoppingScenario 2 : One Stop Shopping
OperationalDatabase
Data Entry Operator
Data Entry Operator
ManagementWait
Report
By Mrs. Chetana
![Page 10: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/10.jpg)
Solution 2Solution 2
Extract data needed for analysis from operational database.Store it in warehouse.Refresh warehouse at regular interval so that it contains up
to date information for analysis.Warehouse will contain data with historical perspective.
By Mrs. Chetana
![Page 11: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/11.jpg)
Solution 2Solution 2
Operationaldatabase
DataWarehouse
Extractdata
Data EntryOperator
Data EntryOperator
Manager
Report
Transaction
By Mrs. Chetana
![Page 12: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/12.jpg)
Scenario 3Scenario 3
Cakes & Cookies is a small, new company.
President of the company wants his company to grow. He needs information so that he can make correct decisions.
By Mrs. Chetana
![Page 13: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/13.jpg)
Solution 3Solution 3
Improve the quality of data before loading it into the warehouse.
Perform data cleaning and transformation before loading the data.
Use query analysis tools to support adhoc queries.
By Mrs. Chetana
![Page 14: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/14.jpg)
Solution 3Solution 3
Query and
Analysistool
President
Expansion
Improvement
sales
time
DataWarehouse
By Mrs. Chetana
![Page 15: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/15.jpg)
Database Vs Data WarehouseDatabase Vs Data WarehouseDatabases : to record Data warehouse : to respond to analysis questions that
are critical for a business. All data warehouses are databases, not all databases
are data warehouses.
By Mrs. Chetana
![Page 16: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/16.jpg)
OLTP System - Online Transaction Processing (Operational System)OLAP System - Online Analytical Processing (Data Warehouse)
Source of dataOLTP: Operational data; OLTPs are the original source of the data.OLAP: Consolidation data; OLAP data comes from the various OLTP Databases
Purpose of dataOLTP: To control and run fundamental business tasksOLAP: To help with planning, problem solving, and decision support
What the dataOLTP: Reveals a snapshot of ongoing business processesOLAP: Multi-dimensional views of various kinds of business activities
Inserts and UpdatesOLTP: Short and fast inserts and updates initiated by end usersOLAP: Periodic long-running batch jobs refresh the data
OLTP Vs OLAP
By Mrs. Chetana
![Page 17: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/17.jpg)
QueriesOLTP: Relatively standardized and simple queries Returning relatively few recordsOLAP: Often complex queries involving aggregations
Processing SpeedOLTP: Typically very fastOLAP: Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes
Space RequirementsOLTP: Can be relatively small if historical data is archivedOLAP: Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP
Database DesignOLTP: Highly normalized with many tablesOLAP: Typically de-normalized with fewer tables; use of star and/or snowflake schemas
Backup and RecoveryOLTP: Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liabilityOLAP: Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
By Mrs. Chetana
![Page 18: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/18.jpg)
Operational Database ( OLAP )An operational database is the database that is accessed and updated on a continual basis and usually handles the daily transactions for a business. Operational databases use an OLTP approach and are designed to be write-optimized.
OLTPOn-Line Transaction Processing (OLTP) is the process in which systems facilitate and manage data entry and retrieval on a frequent basis. The transaction is almost immediately processed and is the main strategy of operational databases. An example of an OLTP system would be an ATM machine. The benefits of using OLTP is that it is fast and efficient and simplifies the process of accessing data. (Business Intelligence).
By Mrs. Chetana
![Page 19: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/19.jpg)
Operational v/s Information SystemOperational v/s Information System
Features Operational InformationCharacteristics Operational processing Informational processing
Orientation Transaction, customer-oriented
Analysis, market-oriented
User Clerk, DBA, database professional
Knowledge workers
Function Day to day operation Decision support
Data contents Current Historical
View Detailed, flat relational Summarized, multidimensional
DB design Application oriented, ER based
Subject oriented, star/snowflakes
Unit of work Short , simple transaction
Complex query
Access Read/write Mostly readBy Mrs. Chetana
![Page 20: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/20.jpg)
Operational v/s Information SystemOperational v/s Information System
Features Operational Information
Focus Data in Information out
Number of records accessed
tens millions
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Priority High performance, high availability
High flexibility, end-user autonomy
Metric Transaction throughput
Query throughput
By Mrs. Chetana
![Page 21: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/21.jpg)
What is Data Warehouse??What is Data Warehouse??
By Mrs. Chetana
![Page 22: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/22.jpg)
A Data Warehouse is not an individual repository product.
It is a decision support systems and a knowledge-based applications architecture.
It is an environment that supports both everyday tactical decision making and long-term business strategizes.
The Data Warehouse environment positions a business to utilize an enterprise-wide data store to link information from diverse sources and make the information accessible for a variety of user purposes, most notably, Strategic Analysis.
By Mrs. Chetana
![Page 23: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/23.jpg)
What is Data Warehouse?What is Data Warehouse?
• A data warehouse is a collection of integrated databases designed to support a DSS.
• By Inmon: A data warehouse is-subject-oriented,-integrated,-time-variant,-nonvolatile
collection of data in support of management’s decision making process.
By Mrs. Chetana
![Page 24: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/24.jpg)
RalphRalph KimballKimball
Ralph Kimball provided a more concise definition of a data warehouse:
A data warehouse is a copy of transaction data specifically structured for query and analysis.
By Mrs. Chetana
![Page 25: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/25.jpg)
Cont..Cont..
Where is it used?
It is used for evaluating future strategy.
It needs a successful technician:◦ Flexible.◦ Team player.◦ Good balance of business and technical
understanding.
By Mrs. Chetana
![Page 26: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/26.jpg)
Subject-orientedSubject-orientedData are organized based on how the users refer to them.
Eg. sales, product, customer.It focuses on modeling and analysis of data for decision
makers.Excludes data not useful in decision support process.
By Mrs. Chetana
![Page 27: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/27.jpg)
IntegrationIntegrationData Warehouse is constructed by integrating multiple
heterogeneous sources.Data Preprocessing are applied to ensure all
inconsistencies regarding naming convention and value representations are removed.
RDBMS
LegacySystem
DataWarehouse
Flat File Data ProcessingData Transformation
By Mrs. Chetana
![Page 28: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/28.jpg)
ExamplesExamples
In terms of data.◦ encoding structures.
◦ Measurement ofattributes.
◦ physical attribute. of data
◦ naming conventions.
◦ Date type format
remarks
By Mrs. Chetana
![Page 29: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/29.jpg)
Time-variantTime-variantA data warehouse is a time-varient database which
supports business needs of end users in comparing and analyzing the business with different time periods. This is also known as Time series analysis.
Provides information from historical perspective
e.g. past 5-10 years.
Every key structure contains either implicitly or explicitly an element of time.
By Mrs. Chetana
![Page 30: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/30.jpg)
NonvolatileNonvolatile
Data are stored in read-only format and do not change over time.
Data warehouse requires two operations in data accessing◦ Initial loading of data◦ Access of data
load access
By Mrs. Chetana
![Page 31: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/31.jpg)
History & EvolutionHistory & Evolution 60’s: Batch Reports
◦ Hard to find and analyze information◦ Inflexible and expensive, reprogram every new request◦ Data collection, database creation
70’s: Database Management System◦ Still inflexible, not integrated with desktop tools,
80’s: Advanced Database systems, Desktop Data Access and Analysis Tools
◦ OLTP , Query tools, Spreadsheets, GUIs◦ Easier to use, but only access operational databases
1990s—2000s: ◦ Data mining and data warehousing, multimedia databases, and Web databases◦ OLAP Engines and Tools
By Mrs. Chetana
![Page 32: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/32.jpg)
Data Warehouse ArchitectureData Warehouse Architecture
By Mrs. Chetana
![Page 33: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/33.jpg)
Operational source systemOperational source systemCharacteristics:Source data often in OLTP (Online Transaction
Processing) systems, also called TPS (Transaction Processing Systems)
High level of performance and availabilityBusy with the normal operations of the
organizationDifferent types of data format and data structures in
different OLTP systems AND DIFFERENT SEMANTICS
By Mrs. Chetana
![Page 34: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/34.jpg)
Operational SystemsOperational Systems Operational Systems
◦ Run the business in real time.
◦ Optimized to handle large numbers of simple read/write transactions.
◦ Optimized for fast response to predefined transactions.
◦ Used by people who deal with customers, products.
Database systems have been used traditionally for OLTP.
◦ Online Transaction Processing
◦ Clerical data processing tasks
◦ Detailed, up to date data
◦ Structured repetitive tasks
Examples of Operational Data◦ Customer Files
◦ Account Balance, Call Record
◦ Point of Sale Data, Production Record
By Mrs. Chetana
![Page 35: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/35.jpg)
Data staging AreaData staging AreaOften the most complex part in the architecture, and
involves...◦ Extraction (E)
◦ Transformation (T)
◦ Load (L)
◦ indexing
ETL-tools can be usedeg. Informatica – PowerCenter
IBM – Infosphere Information Server
Oracle – Data Integrator
Scripts for extraction, transformation and load are
implemented
By Mrs. Chetana
![Page 36: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/36.jpg)
ExtractionExtraction
Reading and understanding the source data and copying the data needed for the data warehouse into staging area for further manipulation,
i.e. transformation
By Mrs. Chetana
![Page 37: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/37.jpg)
TransformationTransformationTransforms the data in accordance with the business rules and standards that have been established
Involves:data conversion/transformation:
rules to convert to a common data format and common terms/semantics
data cleaning/cleansing:
– data scrubbing
use domain-specific knowledge to check the data
– data auditing
discover suspicious pattern, discover violation of stated rulescombining data from multiple sourcesdata aggregation
By Mrs. Chetana
![Page 38: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/38.jpg)
Metadata Repository Metadata Repository
The metadata repository contains the schema of the warehouse
It is similar to data dictionary- the information is kept about the logical data structures, the info about files and addresses, the info about indexes, and so on.
By Mrs. Chetana
![Page 39: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/39.jpg)
Data Presentation AreaData Presentation Area
What is OLAP?Dimensional modelingData MartsROLAP/MOLAP servers
By Mrs. Chetana
![Page 40: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/40.jpg)
Physical storagePhysical storage::
Due to the huge amount of data, the warehouse may be distributed onto multiple servers.
In such a case the metadata is usually replicated on each server.
Most of the data warehouse solutions employ relational database management systems. Multidimensional db mgmt systems are employed in some solutions.
Data extracted from the data warehouse storage aggregated in many ways and the summary is kept in the multidimensional databases.
By Mrs. Chetana
![Page 41: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/41.jpg)
OLAP serversOLAP servers◦ To support and operate on multi-dimensional data structures
These servers receive OLAP queries or data mining requests from front-end tools and process them by accessing data in the physical storage.
The architect of the data warehouse may decide on the number of processing units/servers depending on the typical load of the system.
By Mrs. Chetana
![Page 42: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/42.jpg)
Front-end toolsFront-end tools
This is the interface between the analyst and the data warehouse.
Clients
◦Query and reporting tools◦Analysis tools◦Data mining tools
The analyst should be able to pose OlAP queries or data mining tasks and visualize the results /reports in various textual and graphical ways.
By Mrs. Chetana
![Page 43: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/43.jpg)
Data warehouse Design Data warehouse Design ApproachesApproaches
Top-Down: Starts with Overall design and planning . Useful when technology is mature & well known. The business problems are clear & well understood. Advantages:
- planned & strategic nature
- Emphasizes high-level business needs
- minimizes integration problem Disadvantages:
- Requires an increase costs
- Lacks of flexibility
- Difficult to define the boundaries
- May not be suitable unless the client needs
cross-functional reporting
By Mrs. Chetana
![Page 44: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/44.jpg)
Bottom-Up Starts with experiments & prototypes. Useful in early stage of business modeling & technology development. Starts with lower cost & helps to evaluate benefits of technology
Advantages:
- rapid implementation &
opportunistic application
- Lo cost Disadvantages:
- Problematic when integrating
Data marts
- Design may be out-of-date before delivery
- Requires reengineering for each increment
- Solutions may be rejected by the next line of
business to be involved
- Overall benefit to the business may be minimized By Mrs. Chetana
![Page 45: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/45.jpg)
Combined ApproachCombined Approach
An organization can exploit the planned and strategic nature of the top-down approach
While retaining the rapid implementation and opportunistic application of the bottom-up approach
By Mrs. Chetana
![Page 46: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/46.jpg)
3- tier Architecture of Data Warehouse
By Mrs. Chetana
![Page 47: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/47.jpg)
Three Data Warehouse ModelsThree Data Warehouse ModelsEnterprise warehouse
◦ collects all of the information about subjects spanning the entire organization
Data Mart
◦ a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart
Virtual warehouse
◦ A set of views over operational databases
◦ Only some of the possible summary views may be materialized
By Mrs. Chetana
![Page 48: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/48.jpg)
Data Warehouse Development: A Recommended Approach
Define a high-level corporate data model
Data Mart
Data Mart
Distributed Data Marts
Multi-Tier Data Warehouse
Enterprise Data Warehouse
Model refinementModel refinement
1
2 2
3
4
By Mrs. Chetana
![Page 49: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/49.jpg)
Incremental & Evolutionary ApproachIncremental & Evolutionary Approach
High-level Corporate Data Model: Defined in 1 or 2 months, provides a corporate-wide consistent,
integrated view of data for various usage.
Data Marts & Enterprise data warehouse:
Can be implemented parallel , on the above corporate data model.
Distributed data marts:constructed to integrate different data marts via hub server.
Multitier data warehouse:
sole custodian of all warehouse data.
By Mrs. Chetana
![Page 50: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/50.jpg)
Data MartData Mart
It is lower-cost, scaled down version of the DW.
Data Mart offer a targeted and less costly method of gaining the advantages associated with data warehousing and can be scaled up to a full DW environment over time.
example, the marketing data mart may contain data related to items, customers, and sales. Data marts are confined to subjects.
By Mrs. Chetana
![Page 51: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/51.jpg)
By Mrs. Chetana
![Page 52: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/52.jpg)
Meta DataMeta DataMetadata is “data about data”
With the emergence of the data warehouse as a decision support structure, the metadata are considered as much a resource as the business data they describe.
Metadata are abstractions -- they are high level data that provide concise descriptions of lower-level data.
By Mrs. Chetana
![Page 53: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/53.jpg)
For example, a line in a sales database may contain:
4056 KJ596 223.45
This is mostly meaningless until we consult the metadata that tells us it was store number 4056, product KJ596 and sales of $223.45
The metadata are essential ingredients in the transformation of raw data into knowledge.
They are the “keys” that allow us to handle the raw data.
By Mrs. Chetana
![Page 54: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/54.jpg)
Data Warehouse SchemaData Warehouse SchemaModeling data warehouses: dimensions & measures
◦ Star schema: A fact table in the middle connected to a set
of dimension tables
◦ Snowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflake
◦ Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called
galaxy schema or fact constellation
By Mrs. Chetana
![Page 55: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/55.jpg)
Dimensions & MeasuresDimensions & Measures
Dimensions: perspectives or entities with respect to which an organization wants to keep records .◦ Each dimension will may have a table associated with it called as
“dimension table”
◦ The dimension table contains facts i.e measures and key to each related dimension tables.
◦ Specified by users or experts, or automatically generated and adjusted based on data distributions.
Fact: considered as numerical measures.◦ Quantitative to analyze relationships between dimensions.
By Mrs. Chetana
![Page 56: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/56.jpg)
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcityprovince_or_streetcountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
Measures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
Time_dimensionItem _dimension
Branch_dimension
Location_dimension
Example of Star Schema
By Mrs. Chetana
![Page 57: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/57.jpg)
Star SchemaStar Schema
By Mrs. Chetana
![Page 58: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/58.jpg)
Example of Star SchemaExample of Star Schema
Store Key
Product Key
Period Key
Units
Price
Store Dimension Time Dimension
Product Dimension
Sales Fact Table
Store Key
Store Name
City
State
Region
Period Key
Year
Quarter
Month
Product Key
Product Desc
By Mrs. Chetana
![Page 59: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/59.jpg)
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcity_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
Measures
item_keyitem_namebrandtypesupplier_key
item
branch_keybranch_namebranch_type
branch
supplier_keysupplier_type
supplier
city_keycityprovince_or_streetcountry
city
normalized
Time_dimensionItem _dimension
Branch_dimensionLocation_dimension
Example of Snowflake
By Mrs. Chetana
![Page 60: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/60.jpg)
SnowFlake SchemaSnowFlake SchemaVariant of star schema model.A single, large and central fact table and one or more tables for
each dimension.Used when a dimensional table becomes very bigDimension tables are more normalized
i.e. split dimension table data into additional tablesAdvantages:
Redundancy is reduced and therefore, it becomes easy to maintain and save storage space.
Drawbacks:
Increases number of joins hence time consuming query processing
Complex queries results in slow report generation
By Mrs. Chetana
![Page 61: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/61.jpg)
Example of SnowFlake SchemaExample of SnowFlake Schema
Store Key
Product Key
Period Key
Units
Price
Time Dimension
Product Dimension
Sales Fact Table
Store Key
Store Name
City Key
Period Key
Year
Quarter
Month
Product Key
Product Desc
City Key
City
State
Region
City Dimension
Store Dimension
By Mrs. Chetana
![Page 62: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/62.jpg)
By Mrs. Chetana
![Page 63: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/63.jpg)
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcityprovince_or_streetcountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
Measures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_keyshipper_namelocation_keyshipper_type
shipper
Example of Fact Constellation
By Mrs. Chetana
![Page 64: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/64.jpg)
Fact ConstellationFact Constellation
Multiple fact tables share dimension tables.This schema is viewed as collection of stars hence
called galaxy schema or fact constellation.Sophisticated application requires such schema.
By Mrs. Chetana
![Page 65: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/65.jpg)
Example of Fact ConstellationExample of Fact Constellation
Store Key
Product Key
Period Key
Units
Price
Store Dimension
Product Dimension
SalesFact Table
Store Key
Store Name
City
State
Region
Product Key
Product Desc
Shipper Key
Store Key
Product Key
Period Key
Units
Price
ShippingFact Table
By Mrs. Chetana
![Page 66: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/66.jpg)
Software Engineering point of viewSoftware Engineering point of view
The design and construction of a DWH may consist of the following steps:
◦Planning◦Requirement study◦Problem Analysis◦Warehouse design◦Data integration and testing◦Finally Deployment of data warehouse
By Mrs. Chetana
![Page 67: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/67.jpg)
General Data warehouse design ProcessGeneral Data warehouse design Process
1. Choose a Business Process to modelData warehouse Model : For Complex object collections
Data Mart Model: For focused Process on one kind of business
1. Choose the grain of the business processFundamental , atomic level of data to be represented in fact table
eg. In bank Individual transactions, individual daily snapshot
1. Choose the dimensions that will apply to each fact table record
2. Choose measures that will populate each fact table record (ex: dollars_sold or units_sold)
By Mrs. Chetana
![Page 68: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/68.jpg)
Multi-dimensional Data ModelMulti-dimensional Data ModelData warehouse and OLAP tools are multi dimensional data model.This model views data in the form of data Cube model.A data cube allows data to be modeled & viewed in multiple
dimensions. It is defined by dimensions & facts.
Dimensions: perspectives or entities with respect to which an organization wants to keep records .
◦ Each dimension will may have a table associated with it called as “dimension table”
◦ Specified by users or experts, or automatically generated and adjusted based on data distributions.
Fact: considered as numerical measures.
◦ Quantitative to analyze relationships between dimensions.
By Mrs. Chetana
![Page 69: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/69.jpg)
From Tables and Spreadsheets to Data CubesFrom Tables and Spreadsheets to Data Cubes
A data warehouse is based on a multidimensional data model which views
data in the form of a data cube
A data cube, such as sales, allows data to be modeled and viewed in
multiple dimensions. It is defined by dimensions and facts.
◦ Dimension tables, such as item (item_name, brand, type), or time(day,
week, month, quarter, year), branch, and location.
◦ Fact table contains measures (such as dollars_sold) and keys to each of
the related dimension tables
Viewing things in 4-D becomes tricky
Data cubes are n-dimensional & do not confine data to 3-D
By Mrs. Chetana
![Page 70: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/70.jpg)
4- D data cube representation for sales data4- D data cube representation for sales data
By Mrs. Chetana
![Page 71: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/71.jpg)
Data Models in Data WarehousesData Models in Data Warehouses
In OLTP systems, data are stored in 2D matrixes.
Data warehouses are subject-oriented
◦ Profits, Sales …
◦ Data need to be reorganized to better reflect the subjects.
A data warehouse is based on a multidimensional data model, which views data in the form of a data cube.
A data cube allows data to be modeled and viewed in multiple dimensions.
By Mrs. Chetana
![Page 72: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/72.jpg)
BASICSBASICS Dimensional Model is conceptually represented using Schemas that are
described by basic elements, such as facts, measures, attributes, dimensions, hierarchies and granularity.
Facts are a focus of interest about a particular subject. Facts are used to store
business information(measures such as dollars sold) on which detailed
analysis is carried out.
Dimensions determine the contextual background for the facts and provide
the context of the measures such as item (item name, brand), product, location
or time(day, week, month, quarter, year).
Measures are continuous values that describe the fact from different points of
view (e.g., each sale is measured by its revenue)
By Mrs. Chetana
![Page 73: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/73.jpg)
Hierarchies are made up of discrete dimension attributes linked by one to one relationships and determine how facts may be aggregated and selected significantly for the decision-making process.
Granularity is related to fact table and which represents the lowest level of information that will be stored in fact table.
It is based on
(i) which dimensions will be included
(ii)the hierarchy of each dimension the information will be kept
By Mrs. Chetana
![Page 74: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/74.jpg)
Schema DefinitionSchema Definition
Multidimensional schema is defined using Data Mining Query Language (DMQL). The two primitives, cube definition and dimension definition, can be used for defining the data warehouses and data marts.
Syntax for Cube Definition
define cube < cube_name > [ < dimension-list > }: < measure_list >
Syntax for Dimension Definition
define dimension < dimension_name > as (attribute_or_dimension_list > )
By Mrs. Chetana
![Page 75: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/75.jpg)
Star Schema DefinitionStar Schema Definition The star schema that we have discussed can be defined using Data Mining
Query Language (DMQL) as follows:
define cube sales_star [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state, country)
Explanation:
define cube statement defines a data cube called sale_star which corresponds to central sales fact table.
define dimension statement used to define four dimensions.
By Mrs. Chetana
![Page 76: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/76.jpg)
Snowflake Schema Snowflake Schema DefinitionDefinition
define cube sales_snowflake [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier (supplier key, supplier type))
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city (city key, city, province or state, country))
Explanation:
item & location dimension tables are normalized.
supplier & city are two sales_snoflake tables created.
By Mrs. Chetana
![Page 77: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/77.jpg)
Fact Constellation Schema DefinitionFact Constellation Schema Definitiondefine cube sales [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state, country)
define cube shipping [time, item, shipper, from location, to location]:
dollars cost = sum(cost in dollars), units shipped = count(*)
define dimension time as time in cube sales
define dimension item as item in cube sales
define dimension shipper as (shipper key, shipper name, location as location in cube sales, shipper type)
define dimension from location as location in cube sales
define dimension to location as location in cube sales
By Mrs. Chetana
![Page 78: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/78.jpg)
From Tables to Data From Tables to Data CubesCubesID Product Country Date Sales
1 TV US 1Qtr 100
2 PC Canada 4Qtr 500
3 CAR US 2Qtr 30
4 PC UK 3Qtr 200
5 CAR UK 1Qtr 20
6 CAR UK 2Qtr 15
7 TV Canada 4Qtr 80
By Mrs. Chetana
![Page 79: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/79.jpg)
Lattice of Cuboid ( Data Cube)Lattice of Cuboid ( Data Cube)A data cube is referred as cuboidCuboid can be generated for each subset of given
dimensions.In data warehousing literature, an n-D base cube is
called a base cuboid. The top most 0-D cuboid, which holds the highest-level
of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube, each shows
different level of summarization or group by.
By Mrs. Chetana
![Page 80: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/80.jpg)
Cube: A Lattice of CuboidsCube: A Lattice of Cuboids
all
time item location supplier
time,item time,location
time,supplier
item,location
item,supplier
location,supplier
time,item,location
time,item,supplier
time,location,supplier
item,location,supplier
time, item, location, supplier
0-D cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D cuboid
APEX ( Highest level of Summarization)
Base ( Lowest level of Summarization)
By Mrs. Chetana
![Page 81: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/81.jpg)
A Sample Data CubeA Sample Data Cube
Date
Produ
ct
Cou
ntr
y
sum
sum TV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
Total annual salesof TV in U.S.A.
By Mrs. Chetana
![Page 82: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/82.jpg)
MeasuresMeasuresA data cube measure is a numerical function that can be
evaluated at each point in the data cube space.
1. Distributive: If it can be computed in a distributed manner using count(), sum(), min(), max() distributed aggregate functions.
1. Algebraic: if it can be computed by an algebraic functions with M arguments using avg(), min_N(), max_N(), sum()/count() algebraic aggregate functions.
By Mrs. Chetana
![Page 83: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/83.jpg)
Holistic : if there is no constant bound on the storage size needed to describe a sub aggregate.
Uses median(), mode(), rank() holistic aggregate functions.
By Mrs. Chetana
![Page 84: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/84.jpg)
Concept HierarchiesConcept Hierarchies
A Concept Hierarchy defines a sequence of mappings from a set of low-level concepts to high-level
Consider a concept hierarchy for the dimension “Location”
By Mrs. Chetana
![Page 85: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/85.jpg)
By Mrs. Chetana
![Page 86: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/86.jpg)
Concept HierarchiesConcept HierarchiesMany concept hierarchies are implicit within
the database system
By Mrs. Chetana
![Page 87: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/87.jpg)
Concept HierarchiesConcept HierarchiesConcept hierarchies may also be defined by grouping
values for a given dimension or attribute, resulting in a set-grouping hierarchy
By Mrs. Chetana
![Page 88: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/88.jpg)
Online Analysis Online Analysis Processing(OLAP)Processing(OLAP)
It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
Data Warehouse
Time
Product
Reg
ion
By Mrs. Chetana
![Page 89: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/89.jpg)
OLAP Server Architectures Relational OLAP (ROLAP)
◦ Use relational DBMS to store and manage warehouse data.
◦ ROLAP tools access the data in a relational database and generate SQL queries
to calculate information at the appropriate level as required.
◦ Handles Large amount of data.
◦ Limited by SQL functionalities
By Mrs. Chetana
![Page 90: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/90.jpg)
MOLAPMOLAP Multidimensional OLAP (MOLAP)
◦ uses array-based multidimensional storage engines provides natural indexing
◦ Fast query performance due to optimized storage and indexing
◦ Automated computation of higher level aggregates of the data
◦ Can perform complex calculations
◦ the storage utilization may be low if the dataset is sparse
◦ uses two levels of data storage representation to handle dense and sparse
datasets.
◦ Limited in the amount of data it can handle
By Mrs. Chetana
![Page 91: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/91.jpg)
HOLAPHOLAP Hybrid OLAP (HOLAP)
◦ Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers
higher scalability of ROLAP and faster computation of MOLAP.
◦ Stores large data volumes of detailed information. The aggregations are
stored separately in MOLAP store.
By Mrs. Chetana
![Page 92: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/92.jpg)
OLAP CubeOLAP Cube
City Product Time Units Dollars
All All All 113 251.26
Mumbai All All 64 146.07
Mumbai White Bread All 38 98.49
Mumbai Wheat Bread All 13 32.24
Mumbai Wheat Bread Qtr1 3 7.44
Mumbai Wheat Bread March 3 7.44
By Mrs. Chetana
![Page 93: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/93.jpg)
OLAP OperationsOLAP OperationsDrill down (roll down): reverse of roll-up
from higher level summary to lower level summary or detailed data, or introducing new dimensions
Time
Regio
n
Product
Category e.g Electrical Appliance
Sub Category e.g Kitchen
Product e.g Toaster
By Mrs. Chetana
![Page 94: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/94.jpg)
drill Down on time(from quarters to month)
By Mrs. Chetana
![Page 95: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/95.jpg)
OLAP OperationsOLAP OperationsRoll up (drill-up): summarize data
by climbing up hierarchy or by dimension reductionOwner may be interested in further aggregation along some dimension
Time
Regio
n
ProductCategory e.g Electrical Appliance
Sub Category e.g Kitchen
Product e.g Toaster
By Mrs. Chetana
![Page 96: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/96.jpg)
roll_up on location (from cities to countries)
By Mrs. Chetana
![Page 97: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/97.jpg)
OLAP OperationsOLAP OperationsSlice and dice:
user specify interest in some particular values of specific dimensions.project and select
Time
Regio
n
ProductProduct=Toaster
Time
Regio
n
By Mrs. Chetana
![Page 98: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/98.jpg)
Slice for time = “Q1”Slice for time = “Q1”
By Mrs. Chetana
![Page 99: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/99.jpg)
Dice for (location=“Toronto” or “Vancouver”)Dice for (location=“Toronto” or “Vancouver”)and (time=“Q1” or “Q2”) and (item=“Mobile” or “Modem”)and (time=“Q1” or “Q2”) and (item=“Mobile” or “Modem”)
By Mrs. Chetana
![Page 100: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/100.jpg)
By Mrs. Chetana
![Page 101: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/101.jpg)
OLAP OperationsOLAP Operations
Pivot (rotate):reorient the cube, visualization, 3D to series of 2D planes.
Time
Regio
n
Product
Region
Tim
e
Product
By Mrs. Chetana
![Page 102: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/102.jpg)
By Mrs. Chetana
![Page 103: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/103.jpg)
Other OLAP OperationsOther OLAP OperationsDrill across:
◦ Accesses more than one fact table that is linked by common dimensions.
◦ Combines cubes that share one or more dimensions.
Drill through: ◦ Drill down to the bottom level of a data cube down to its back-
end relational tables.
Cross-tab :◦ Spreadsheet style row/column aggregates.
By Mrs. Chetana
![Page 104: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/104.jpg)
By Mrs. Chetana
![Page 105: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/105.jpg)
Typical Data warehouse Architecture
By Mrs. Chetana
![Page 106: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/106.jpg)
Data Warehouse Data Warehouse ImplementationImplementationData warehouses contain huge volumes of data.OLAP servers demand that queries should be answered
in seconds. So, data warehouse should need highly efficient cube
computation techniques, access methods and query processing techniques.
The core of multidimensional data analysis is the efficient computation of aggregations across many sets of dimensions.
In SQL aggregations are referred to as group-by’s.Each group-by can be represented as cuboid.Set of group-by’s forms a lattice of a cuboids defining a
data cube. By Mrs. Chetana
![Page 107: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/107.jpg)
The compute cube Operator and the Curse of Dimensionality
The compute cube operator computes aggregates over all subsets of the dimensions specified in the operation.
Requires an excessive storage space, especially for large number of dimensions.
A data cube is a lattice of cuboids.Suppose that we create a data cube for AllElectronics sales
that contains the following: city, item year, and sales_in_dollars.
Efficient Data Cube Computation
Compute the sum of sales, grouping by city and item.Compute the sum of sales, grouping by city.Compute the sum of sales, grouping by item.
By Mrs. Chetana
![Page 108: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/108.jpg)
Lattice of cuboid, making up 3-D data cube.
By Mrs. Chetana
![Page 109: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/109.jpg)
Efficient Data Cube ComputationWhat is the total number of cuboids, or group-by’s, that
can be computed for this data cube?
Three attributes: city, item, year (dimensions) sales_in_dollars (measure)The total number of cuboids or group-by’s computed for
this cube are 23=8.Group-by’s:
{(city,item,year), (city, item), (city, year), (item, year), (city), (item), (year),()}. () : group-by is empty i.e. the dimensions are not grouped.
Base cuboid contains all three dimensions (most specific) Apex cuboid is empty (least specific)
By Mrs. Chetana
![Page 110: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/110.jpg)
Cube OperationCube definition and computation in DMQL:
define cube sales_cube[ city, item, year]: sum (sales_in_dollars)
compute cube sales_cube
Transform it into a SQL-like language
(with a new operator cube by, introduced by Gray et al.’96)
SELECT item, city, year, SUM (amount)
FROM SALES
CUBE BY item, city, year(item)(city)
()
(year)
(city, item) (city, year) (item, year)
(city, item, year)
By Mrs. Chetana
![Page 111: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/111.jpg)
Efficient Data Cube ComputationOn-line analytical processing may need to access different cuboids
for different queries.So we have to compute all or at least some of the cuboids in the
data cube in advance.Pre-computation leads to fast response time and avoids some
redundant computation.A major challenge related to pre-computation would be storage
space if all the cuboids in the data cube are computed, especially when the cube has many dimensions.
The storage requirements are even more excessive when many of the dimensions have associated concept hierarchies, each with multiple levels.
This problem is referred to as the Curse of Dimensionality.
By Mrs. Chetana
![Page 112: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/112.jpg)
Efficient Data Cube Efficient Data Cube ComputationData cube can be viewed as a lattice of cuboids
◦ The bottom-most cuboid is the base cuboid
◦ The top-most cuboid (apex) contains only one cell
How many cuboids in an n-dimensional cube with L levels?Li is number of levels associated with dimension i.
Eg. For time dimension hierarchy is “day<month<quarter<year”
Li = 4 & All is Virtual top level i.e Total Level = (4+1) = 5
& Dimensions = 10
No of cuboids generated = 510 = 9.8x106
T=∏i=1
n
(Li+1)
By Mrs. Chetana
![Page 113: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/113.jpg)
Partial Materialization: Selected Computation of Cuboids
There are three choices for data cube materialization given a base cuboid.
No Materialization – Do not pre-compute any of the non-base cuboids
Full Materialization – Pre-compute all of the cuboids. Need huge memory space.
Partial Materialization - Identify the subsets of cuboids or subcubes to materialize. Exploit the materialized cuboids or subcubes during query
processing. Efficiently update the materialized cuboids or subcubes during
load and refresh.
Selection of which cuboids to materialize
Based on size, sharing, access frequency, etc.
By Mrs. Chetana
![Page 114: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/114.jpg)
Iceberg CubeIceberg Cube
Computing only the cuboid cells whose count
or other aggregates satisfying the condition like
HAVING COUNT(*) >= minsup
It is called an Iceberg-Cube because it contains only some of the cells of the full cube, like the tip of an iceberg.
Motivation Only a small portion of cube cells may be “above the water’’
in a sparse cube Only calculate “interesting” cells — data above certain
threshold Avoid explosive growth of the cube
By Mrs. Chetana
![Page 115: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/115.jpg)
Iceberge Cube Query with three dimensions Iceberge Cube Query with three dimensions
SELECT A,B,C, Count(*),SUM(X)
FROM TableName CUBE BY A,B,C HAVING COUNT(*)>=minsup
By Mrs. Chetana
![Page 116: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/116.jpg)
Indexing OLAP Data: Bitmap Index Index on a particular columnEach value in the column has a bit vector : bit-op is fastThe length of the bit vector : # of records in the base tableThe i-th bit is set if the i-th row of the base table has the value for the
indexed columnnot suitable for high cardinality domains It reduces join, aggregation & comparison operations
Cust Region TypeC1 Asia RetailC2 Europe DealerC3 Asia DealerC4 America RetailC5 Europe Dealer
RecID Retail Dealer1 1 02 0 13 0 14 1 05 0 1
RecID Asia Europe America1 1 0 02 0 1 03 1 0 04 0 0 15 0 1 0
Base table Index on Region Index on Type
By Mrs. Chetana
![Page 117: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/117.jpg)
Indexing OLAP Data: Join IndicesJoin index: JI(R-id, S-id)
where R (R-id, …) S (S-id, …)Traditional indices map the values to a list of record
ids◦ It materializes relational join in JI file and
speeds up relational join
In data warehouses, join index relates the values of the dimensions of a star schema to rows in the fact table.◦ E.g. fact table: Sales and two dimensions item and
location.
◦ A join index on location maintains for each distinct location a list of R-IDs of the tuples recording the Sales in the locations.
◦ Join indices can span multiple dimensions
By Mrs. Chetana
![Page 118: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/118.jpg)
Efficient Processing OLAP Queries
The purpose of materializing cuboids and constructing OLAP index
structures is to speed up the query processing in data cubes.
Given materialized views, query processing should proceed as follows:
Determine which operations should be performed on the available
cuboids
◦ Transform drill, roll, etc. into corresponding SQL and/or OLAP
operations, e.g. Slice & dice = selection + projection
Determine which materialized cuboid(s) the relevant operations should
be applied.
By Mrs. Chetana
![Page 119: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/119.jpg)
Efficient Processing OLAP QueriesSuppose that the query to be processed is on
{brand, province_or_state} with the selection constant “year = 2004”,
Dimension hierarchies as follows:
Time: “day<month<quarter<year” Item: “item_name<brand<type”
Location: “street<city<state<country”
and there are 4 materialized cuboids available:
1) { year, item_name, city }
2) {year, brand, country}
3) {year, brand, province_or_state}
4) {item_name, province_or_state} where year = 2004
Which of the cuboids should be selected to process the query?By Mrs. Chetana
![Page 120: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/120.jpg)
Solution:Solution:Cuboid 2 cant be used because country is a more general
concept than state.
Cuboids 1,3,4 can be used to process query:1. they have the same set or superset of dimensions in query
2. the selection clause in the query can imply the selection in the cuboid
3. the abstraction levels for the item and location dimensions in these cuboids are at the finer level than brand and state, respectively.
By Mrs. Chetana
![Page 121: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/121.jpg)
How would the cost of each cuboid compare if used to process the query??Cuboid 1 would cost most because both item_name and
city are at lower level than brand and state concepts specified in query
Cuboid 3 will be smaller than cuboid 4;◦ If not many year values associated with items in the cube
◦ But there are mant item_names for each brand
Cuboid 4 may be better choice; If efficient indices are available for cuboid 4.
By Mrs. Chetana
![Page 122: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/122.jpg)
Data Warehouse UsageData Warehouse Usage Three kinds of data warehouse applications
◦ Information processing
supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs
◦ Analytical processing
Operates on historical data in both summarized and detailed forms.
multidimensional data analysis of data warehouse data
supports basic OLAP operations, slice-dice, drill-down, roll- up, and pivoting
◦ Data mining
knowledge discovery from hidden patterns
supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization toolsBy Mrs. Chetana
![Page 123: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/123.jpg)
Data Warehouse UsageData Warehouse UsageDo OLAP systems perform data mining? Are OLAP systems actually data mining systems?
OLAP is a data summarization/ aggregation tool that helps simplify data analysis.
Data mining allows the automated discovery of implicit patterns and interesting knowledge hidden in large amounts of data.
Data mining covers much broader spectrum than simple OLAP operations because it performs not only data summary and comparison but also association, classification, prediction, clustering, time-series analysis, and other data analysis tasks.
By Mrs. Chetana
![Page 124: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/124.jpg)
From On-Line Analytical Processing (OLAP) to On Line Analytical Mining (OLAM)
Why online analytical mining?
◦ High quality of data in data warehouses DW contains integrated, consistent, cleaned data
◦ Available information processing structure surrounding data warehouses ODBC, OLEDB, Web accessing, service facilities, reporting
and OLAP tools
◦ OLAP-based exploratory data analysis Mining with drilling, dicing, pivoting, etc.
◦ On-line selection of data mining functions Integration and swapping of multiple mining functions,
algorithms, and tasks
By Mrs. Chetana
![Page 125: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/125.jpg)
August 1, 2015 Data Mining: Concepts and Techniques 125
An Integrated OLAM and OLAP Architecture
Data Warehouse
Meta Data
MDDB
OLAMEngine
OLAPEngine
User GUI API
Data Cube API
Database API
Data cleaning
Data integration
Layer3
OLAP/OLAM
Layer2
MDDB
Layer1
Data Repository
Layer4
User Interface
Filtering&Integration Filtering
Databases
Mining query Mining result
By Mrs. Chetana
![Page 126: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/126.jpg)
Summary of UNIT- IISummary of UNIT- IIWhy data warehousing?
A multi-dimensional model of a data warehouse
◦ Star schema, snowflake schema, fact constellations
◦ A data cube consists of dimensions & measures
OLAP operations: drilling, rolling, slicing, dicing and pivoting
Data warehouse architecture
OLAP servers: ROLAP, MOLAP, HOLAP
Efficient computation of data cubes
◦ Partial vs. full vs. no materialization
◦ Indexing OALP data: Bitmap index and join index
◦ OLAP query processing
From OLAP to OLAM (on-line analytical mining)
By Mrs. Chetana
![Page 127: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/127.jpg)
Building Data WarehouseBuilding Data Warehouse
Data SelectionData Preprocessing
◦Fill missing values◦Remove inconsistency
Data Transformation & IntegrationData Loading Data in warehouse is stored in form of fact
tables and dimension tables.
By Mrs. Chetana
![Page 128: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/128.jpg)
Case StudyCase Study
Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda.
There products are sold in North,North West and Western region of India.
They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda.
The President of the company wants sales information.
By Mrs. Chetana
![Page 129: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/129.jpg)
Sales InformationSales Information
January February March April
14 41 33 25
Report: The number of units sold.
113
Report: The number of units sold over time
By Mrs. Chetana
![Page 130: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/130.jpg)
Sales InformationSales Information
Jan Feb Mar Apr
Wheat Bread 6 17
Cheese 6 16 6 8
Swiss Rolls 8 25 21
Report : The number of items sold for each product withtime
Product
Tim
eBy Mrs. Chetana
![Page 131: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/131.jpg)
Sales InformationSales Information
Jan Feb Mar Apr
Mumbai Wheat Bread 3 10
Cheese 3 16 6
Swiss Rolls 4 16 6
Pune Wheat Bread 3 7
Cheese 3 8
Swiss Rolls 4 9 15
Report: The number of items sold in each City for each product with time
Product
Tim
e
City
By Mrs. Chetana
![Page 132: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/132.jpg)
Sales InformationSales Information
Report: The number of items sold and income in each region for each product with time.
Jan Feb Mar Apr
Rs U Rs U Rs U Rs U
Mumbai Wheat Bread 7.44 3 24.80 10
Cheese 7.95 3 42.40 16 15.90 6
Swiss Rolls 7.32 4 29.98 16 10.98 6
Pune Wheat Bread 7.44 3 17.36 7
Cheese 7.95 3 21.20 8
Swiss Rolls 7.32 4 16.47 9 27.45 15
By Mrs. Chetana
![Page 133: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/133.jpg)
Sales Measures & DimensionsSales Measures & DimensionsMeasure – Units sold, Amount.Dimensions – Product,Time,Region.
By Mrs. Chetana
![Page 134: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/134.jpg)
Sales Data Warehouse ModelSales Data Warehouse Model
City Product Month Units Rupees
Mumbai Wheat Bread January 3 7.95
Mumbai Cheese January 4 7.32
Pune Wheat Bread January 3 7.95
Pune Cheese January 4 7.32
Mumbai Swiss Rolls February 16 42.40
Fact Table
By Mrs. Chetana
![Page 135: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/135.jpg)
Sales Data Warehouse ModelSales Data Warehouse Model
City_ID Prod_ID Month Units Rupees
1 589 1/1/1998 3 7.95
1 1218 1/1/1998 4 7.32
2 589 1/1/1998 3 7.95
2 1218 1/1/1998 4 7.32
1 589 2/1/1998 16 42.40
By Mrs. Chetana
![Page 136: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/136.jpg)
Sales Data Warehouse ModelSales Data Warehouse Model
Prod_ID Product_Name Product_Category_ID
589 Wheat Bread 1
590 White Bread 1
288 Coconut Cookies 2
Product Dimension Tables
Product_Category_Id Product_Category
1 Bread
2 Cookies
By Mrs. Chetana
![Page 137: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/137.jpg)
Sales Data Warehouse ModelSales Data Warehouse Model
City_ID City Region Country
1 Mumbai West India
2 Pune NorthWest India
Region Dimension Table
By Mrs. Chetana
![Page 138: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/138.jpg)
Sales Data Warehouse ModelSales Data Warehouse Model
Sales Fact
Region
ProductProduct
Category
Time
By Mrs. Chetana
![Page 139: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/139.jpg)
PresentationPresentation
Time
Regio
n
Product
Report
ReportingTool
By Mrs. Chetana
![Page 140: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/140.jpg)
Data Warehousing includesData Warehousing includes
Build Data WarehouseOnline analysis processing(OLAP).Presentation.
RDBMS
Flat File
Presentation
Cleaning ,Selection &Integration
Warehouse & OLAP serverClient
By Mrs. Chetana
![Page 141: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/141.jpg)
Need for Data WarehousingNeed for Data Warehousing
Industry has huge amount of operational data
Knowledge worker wants to turn this data into useful information.
This information is used by them to support strategic decision making .
By Mrs. Chetana
![Page 142: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/142.jpg)
Need for Data Warehousing Need for Data Warehousing (contd..)(contd..)
It is a platform for consolidated historical data for analysis.
It stores data of good quality so that knowledge worker can make correct decisions.
By Mrs. Chetana
![Page 143: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/143.jpg)
Need for Data Warehousing Need for Data Warehousing (contd..)(contd..)
From business perspective-it is latest marketing weapon-helps to keep customers by learning more about their needs .-valuable tool in today’s competitive fast evolving world.
By Mrs. Chetana
![Page 144: UNIT – II Introduction to Data Warehousing By Mrs. Chetana](https://reader038.vdocument.in/reader038/viewer/2022110101/56649e925503460f94b97eaf/html5/thumbnails/144.jpg)
Data Warehousing ToolsData Warehousing Tools
Data Warehouse◦SQL Server 2000 DTS◦Oracle 8i Warehouse Builder
OLAP tools◦SQL Server Analysis Services◦Oracle Express Server
Reporting tools◦MS Excel Pivot Chart◦VB Applications
By Mrs. Chetana