data ware house architecture
DESCRIPTION
this was the slide i made for class presentation .now it is publicTRANSCRIPT
DATA WARE HOUSE AND IT’S ARCHITECTURE
PRESENTED BY: DEEPAK CHAURASIA
M.TECH (DELHI COLLEGE OF
ENGINEERING)
The issus I’ll make focus on… What is data ware house? Architecture of data ware house? Olap server and its various types and
their working? Data marts?
What is this dataware house all about ??
A data warehouse is a Subject-oriented ->DATABASE AND DATAWARE HOUSE ARE 2
DIFFERENT THINGS SO DIFFERENT APPROACH S OF STORING DATA INTO THEM .
Integrated -> BRINGING INTO A COMMON FORMAT
Time-varying ->HISTORICAL DATA ,DATE ASSOCIATED WITH TIME
Non-volatile -> UNDELETABLE AND NON UPDATABLE FORMAT
collection of data that is used primarily in organizational decision making.
Subject oriented??
5
Application -orientation
Operational Database
Saving account
Order processing
Data ware house
Subject-orientation
sales
account
Stock mgmt
Billing
Loan account
Current account
Business
Bank
Explanation As we can see in both business and bank
example the databases store the data application wise . It simply means that for every operational application of the organization there is a storage associated in which that application specific data are stored. These storages are called database.
But in the case of data ware house of the organization the data are stored subject wise , this subject is most important aspect of the organization . for bank account is important for business sale is important
Integrated ??• Data in DW comes from several operational systems.
• Different datasets in these operational system have different file formats.
• Example: Data for subject Account comes from 3 different data sources.(AS SHOWN IN FIGURE)
Account
savings
current
Loan
Subject = account
Operational environment
o So variations could be there, like:
1. Naming conventions could be different.Example: Saving account no. could be of 8 bytes long but only 6 bytes
for checking accounts.
2. Number of total Attributes for data items could be different.Example :saving account can have 5 attribute while checking account
can have 7 attribute associated with it.
Time variant??
The operational database stores only current data but the data ware house stores all present as well as past data in order to full fill its purposes.
Data is stored as series of snapshots each representing a period of time.
Data is tagged with some element of time - creation date, as of date, etc.
Data is available on-line for long periods of time for trend analysis and forecasting. For example, five or more years
Data warehouse
Non-volatile??
Data from operational systems are moved into DW after
specific intervals.(process is called refreashing)
Business transaction don’t update in Data ware house.
Data from Data ware house is not deleted.
The 3 tier architecture of Data Ware house---
• When all the components of the system are combined together to form the complete system then the style of designing(combining) of that structure is known as the architecture of the system.(ex-the architecture of a school building).
• In data ware house the components are-1. Data acquisition2. Data storage3. Data processing4. Data delivery
Layers(ex. Osi reference model in computer network ) means the system is made by logically separated components and
tier means the system is made by physically separated components.
The various possible architecture while dealing with database:
Hare database (in the form of files) is itself stored in the client computer.
Hare database server is present in the distant place and client machine and database are connected via network.
Here between the client machine and the database server we have included an application server which is mainly at server side and does the processing and return results to the client machine.
conclusionsTiers
SecurityMaintainabilityNo . Of users
Speedcost
The architecture of data ware house
Information Sources Data Warehouse Server(Tier 1)
OLAP Servers(Tier 2)
Clients(Tier 3)
OperationalDB’s
External sources
extracttransformload
Data Marts
DataWarehous
e
MOLAP
ROLAP
serve
OLAP
Query/Reporting
Data Mining
serve
serve
Data tier logic tier presentation tier
The bottom most:
Operational databases
External sourse
• These are the application specific database which are used to store all the daily basis transactional data of the organization.
• This is the database which is used to store all important external information.
Database vs. data ware house
OLTP (on-line transaction processing)Major task of traditional relational DBMSDay-to-day operations: purchasing,
inventory, banking, manufacturing, payroll, registration, accounting, etc.
OLAP (on-line analytical processing)Major task of data warehouse system.Data analysis and decision making.Forecasting, monitoring of business.
How loading is done of the Warehouse??
This is done using back end tools. To know about back end tools go to the
next page.
Data extraction:get data from multiple, heterogeneous, and external sources.Data cleaning: correcting values. Data transformation: converting from one format to another format. (pond kg , age dob)Load: summarize tables are loaded into data ware house.Refresh:propagate the updates from the data sources to the warehouse.
Tier 1 :data ware house
It is the data ware house that is loaded with strategy making information.
This tier also consists of data marts.
Tier 2 This tier consists of Olap server which
are used for the processing purposes. Here the following issues are also handled—
Security of data.(you are not letting user directly communicate with data base)
Business logic(here you can decide what kind of information to be shown to a particular kind of query ).
Translation(users high level query are converted into low level sql query).
Intermediate calculations(removes burden from user interface and database )
Olap server
Rolap server Molap server
Choose this if space is important for you
Choose this if time is important for you
HOW DOES ROLAP WORK??
Complex
query
User
requ
est
resu
lts
Multi dimensional view
Desktop client
Rolap server
Rdbms server
Data ware house
Creating data cube dynamically (on the fly)
ROLAP
DETAILS Relational online analytical processing (ROLAP) is
a form of online analytical processing (olap) that performs multidimensional analysis of data which is stored in a relational database rather than in a multidimensional database.
In a three-tiered architecture, the user submits a request for multidimensional analysis and the ROLAP engine converts the request to SQL for submission to the relational database. Then the operation is performed in reverse: the engine converts the resulting data from SQL to a multidimensional format(on the fly) before it is returned to the client for viewing.
In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
ans date sum1 812 48
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
Add up total sale amount by day
QUERY
HOW DOES MOLAP WORK??
User
request
Create
and
store
sum
mary
data cu
bes
Multi dimensional view
Desktop client
Molap server
Rdbms server
Data ware house
Multidimensional database
resu
lts
Molap
POINTS ABOUT MOLAP: Here we use Multidimensional database for the
purpose of data fetching when an analytical query is submitted by user.
Facts (fact table)are stored in multi-dimensional arrays.
Dimensions(dimension table) used to index the arrays.
One of the major distinctions of molap against a rolap tool is that data are pre-summarized pre-
calculated and are stored in an optimized format in a multidimensional cube, instead of in a relational database , in accordance with a client’s reporting
requirements .
MOLAP is more optimized for fast query performance and retrieval of summarized information.
There are certain limitations to implementation of a MOLAP system, one primary weakness of which is that MOLAP tool is less scalable than a ROLAP tool as the former is capable of handling only a limited amount ofdata.
Pre-calculating or pre-consolidating transactional data improves speed.
The MOLAP Cube
sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8
s1 s2 s3p1 12 50p2 11 8
Fact table view: Multi-dimensional cube:
dimensions = 2
Add up total sale amount by day
dimensions = 3
Multi-dimensional cube:Fact table view:
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
day 2 s1 s2 s3p1 44 4p2
s1 s2 s3p1 12 50p2 11 8
day 1
Add up total sale amount by day
The total sale of of computers in year 2008 at the location asia is 200 unit
The total sale of of books in year 2008 at the location Europe is 200
Hybrid OLAP (HOLAP)
HOLAP = Hybrid OLAP:
Best of both worlds
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools
Multi-dimensional access Multidimension
al Viewer
RelationalViewer
ClientMDBMS Server
Multi-dimensionaldata
SQL-Read
RDBMS Server
Userdata Meta data
Deriveddata
SQL-Reach
Through
SQL-Read
Data Flow in HOLAP
Pie chart
reports
GraphsQuery result Bar chart
Front end tools
Mobile phone
computer
Data mart
37
THANK YOU