basic concepts of on-line analytical processing

23
1 Basic concepts of On- Line Analytical processing DT211 /4

Upload: carina

Post on 23-Mar-2016

81 views

Category:

Documents


1 download

DESCRIPTION

Basic concepts of On-Line Analytical processing . DT211 /4 . What is OLAP. OLAP stands for "On-Line Analytical Processing.“ OLTP ("On-Line Transaction Processing") OLAP describes a class of technologies that are designed for live ad hoc data access and analysis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Basic concepts of On-Line  Analytical processing

1

Basic concepts of On-Line Analytical processing

DT211 /4

Page 2: Basic concepts of On-Line  Analytical processing

2

What is OLAP

• OLAP stands for "On-Line Analytical Processing.“• OLTP ("On-Line Transaction Processing")• OLAP describes a class of technologies that are designed

for live ad hoc data access and analysis.• OLTP generally relies solely on relational databases,• OLAP has become synonymous with multidimensional

views of business data supported by multidimensional databases

• Relational databases were never intended to provide data synthesis, analysis and consolidation functionality.

Page 3: Basic concepts of On-Line  Analytical processing

3

What is OLAP• OLTP databases are optimised for transaction updating

however, OLAP applications are used by managers and analysts for a higher level aggregate view of the data, thus they are designed for analysis.

• Many problems that people try to solve using relational databases e.g. summaries are handled much more efficiently by an OLAP server than by RDBMS

Page 4: Basic concepts of On-Line  Analytical processing

4

Key OLAP Features

Although OLAP applications are found in widely divergent functional areas, as illustrate in the table opposite. Moreover they all have the following key features:

1. multi-dimensional views of data (MD databases via Star Schema)

2. Support complex calculations

3. Time intelligence

Page 5: Basic concepts of On-Line  Analytical processing

A star schema for credit card purchases

Cardholder Key Purchase Key1 2

Fact TableAmountTime KeyLocation Key

101 14.50

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15 4 115 8.251 2 103 22.40

Location Key Street10 425 Church St

Location DimensionRegionStateCity

SCCharleston 3...

.

.

.

.

.

.

.

.

.

.

.

.

GenderMale

.

.

.

Female

Income Range50 - 70,000

.

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

.

.

.

.

.

2 Sara Smith

Cardholder Dimension

Purchase Key Category1 Supermarket

.

.

.

.

.

.

2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Time Key Month10 Jan

Time DimensionYearQuarterDay

15 2002...

.

.

.

.

.

.

.

.

.

.

.

.

Star Schema: basis of MD view

Page 6: Basic concepts of On-Line  Analytical processing

• Example of three-dimensional query.

• What is the total amount and number of purchases for vehicles in region 2 for December.

Multidimensional cube for credit card purchases

Dec.

Mar.

Feb.

Apr.

May

Jun.

Jul.

Aug.

Sep.

Oct.

Nov.

Jan.

Mon

th

Supe

rmar

ket

Mis

cella

neou

s

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

Category

RegionOne

FourThreeTwo

Month = Dec.

Count = 110Amount = 6,720Region = TwoCategory = Vehicle

Multi-dimensional view as a cube: also represented a 4 column table

Page 7: Basic concepts of On-Line  Analytical processing

7

Why Multidimensional Data• Queries requiring only a single number to be

retrieved need not use multidimensional databases.• If queries involved retrieving multiple numbers

and aggregating them for large databases can become intolerable as relational databases can scan only a few hundred records per second.

• However multidimensional databases can add up 10,000 or more numbers in rows and columns per second.

• Thus for such queries multidimensional databases have an enormous performance advantage

Page 8: Basic concepts of On-Line  Analytical processing

Multi-dimensional Operations

• Slice – A single dimension operation• Dice – A multidimensional operation• Roll-up – A higher level of generalization• Drill-down – A greater level of detail• Rotation – View data from a new perspective

Page 9: Basic concepts of On-Line  Analytical processing

9

Simple Hierarchies: Roll up

• With hierarchical dimensions the database knows not to combine members of the dimension that are at different levels of the hierarchy: referred to as roll-up

• It allows the user to view queries at all or any different levels e.g.. At street level ,city level, state level and region level. (refer to the above star schema example )

• Such hierarchies facilitate drill down to successive levels of detail: State level, city level, street level

Page 10: Basic concepts of On-Line  Analytical processing

10

Multiple hierarchies: roll up

• Utilising multiple hierarchies e.g. product sales can roll up by region, by type , by brand name and so forth. Without this capability an extra dimension would have to be created for each.

• Another use of multiple hierarchies is for geographical dimensions e.g.:

Page 11: Basic concepts of On-Line  Analytical processing

11

Drill down to core database• Most organisations now utilise relational

databases as standard for their data warehouses. • Often there is no need to replicate all the data in

the relational database into a MD database for OLAP.

• Summary level data can be kept in the MD database and detailed data in the relational database.

Page 12: Basic concepts of On-Line  Analytical processing

12

Drilling to relational data• To get a single number from a MD database takes

the same time as it does from a relational database.• Thus it would be futile to individual customers

into a MD database. But for summarised data a MD database is superior.

• Thus ideally you should be able to drill down through the MD database into the relational database.

• Such an approach is useful as most of data volume will reside at the detailed level and will thus not hinder queries of the higher levels

Page 13: Basic concepts of On-Line  Analytical processing

13

Support for complex calculations

• Important computational features of OLAP servers inlcude:

– Independently dimensioned variables (IDV)

– Statistical calculations

– Consolidation speed

– Vector Arithmetic

Page 14: Basic concepts of On-Line  Analytical processing

14

OLAP calculations : Variables • Variables are numeric measures (facts) such as Sales, Cost,

price…; dimensions include region, customer type, product… : i.e. fact table and dimension tables

• OLAP servers can treat variables as a special dimension. So one can select only the relevant dimensions for each variable (IDV) . See next slide

• Must provide a range of powerful computational and statistical methods such as that required by sales forecasting: regression analysis , projection . Correlations…

• They can also incorporate various rules for consolidation

Page 15: Basic concepts of On-Line  Analytical processing

15

Star schema for property sales of DreamHome

Page 16: Basic concepts of On-Line  Analytical processing

16

Vector Arithmetic• Data held in 2-D arrays [Matrix] can be more easily

manipulated than data stored in a relational table.• Thus a 2-D plane for actual can be easily subtracted

from a plane from budget to give a plane for variance.• Such arithmetic allows entire planes of the database to

be combined quickly.

Page 17: Basic concepts of On-Line  Analytical processing

17

Time Series Data Types• Users want to look at trends in all aspects of their business

e.g. sales trends, market trends etc.• A series of numbers representing a particular variable over

time is called a time series e.g.. 52 weekly sales numbers is a time series.

• Utilising a time-series data type allows you to store an entire string of numbers representing daily, weekly or monthly data.

• Thus an OLAP server that supports time-series data type allows one to store historical data without having to specify a separate dimension for time.

• Unlike other dimensions time has special attributes and rules.

Page 18: Basic concepts of On-Line  Analytical processing

18

Time-series data type

• Time series always have a particular periodicity.• Time series data must include rules to convert one

periodicity to another• In the absence of a time-series data type a new

dimension must be declared and labelled explicitly.

• A time-series data cell contains a great deal of information compared with a single cell or even a full record.

Page 19: Basic concepts of On-Line  Analytical processing

19

Time-Series Data types

• Consider the following example for a time-series data type of sales.

• Start date = 1\1\2000 • Periodicity = Daily, business days only • Conversion = Summation • Long description = Variable=Sales, Product=Nuts,

Region=East • Data type = Numeric, single precision • Sacristy = Non-sparse • Calendar = 445 Fiscal year • Data points = 708,800,821,743,779,856,878,902,799, ...

Page 20: Basic concepts of On-Line  Analytical processing

20

Time-series data types

• Start date is the first data point• Periodicity can be daily, weekly etc with calendar

years, fiscal periods and business weeks etc being understood.

• Data type can be single precision, double precision, text strings or dates

• Sparse data is used where the same number is used over and over again e.g. price. Defining it as sparse would cause the database to store dates on which the price changed and the corresponding new values.

• Data points can store very long time series e.g. 10 years of daily data.

Page 21: Basic concepts of On-Line  Analytical processing

21

Sparse Data

• When less than 10% of the cells contain data the database is said to be sparsely populated or sparse.

• Scarcity can also occur if there are many cells that contain the same number e.g.. Price of a product every day.

• This situation can also be represented by storing the number once along with the number of days that the number is repeated

• While a relational database would fill up the database with duplicate data an OLAP server that understands sparse data can skip over zeros, missing data and duplicate data.

Page 22: Basic concepts of On-Line  Analytical processing

22

Conclusion

• In essence OLAP technology is a fast, flexible data summarisation and analysis tool.

• The data analysis requires the ability to summarise data in many ways and view trends.

• It should have 3 main characteristics: MD views, ability to perform complex calculations, time intelligence

Page 23: Basic concepts of On-Line  Analytical processing

23

Question• Business decisions require the delivery of

critical information in a timely, suitable format. Explain, using appropriate examples, how OLAP can facilitate the business decision making process.