decision support and date warehouse jingyi lu. outline decision support system olap vs. oltp what is...

Post on 04-Jan-2016

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Decision Support andDate Warehouse

Jingyi Lu

Outline

Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform, and Load (ETL)

Decision Support System

Information technology to help the knowledge worker(executive, manager, analyst) make faster and better decisions.

– What were the sales volumes by region and product category for the last year?

– Which orders should we fill to maximize revenues?

– Will a 10% discount increase sales volume sufficiently?

Decision Support Systems

Created to facilitate the decision making process

So much information that it is difficult to extract it all from a traditional database

Need for a more comprehensive data storage facility

Data Warehouse

Decision Support Systems

Extract Information from data to use as the basis for decision making

Used at all levels of the Organization Tailored to specific business areas Ad Hoc queries to retrieve and display

information Combines historical operation data with

business activities

Decision Support Systems

OLAP vs. OLTP

OLTP (On-line Transaction Processing): is characterized by a large number of short on-line transactions .-----> Operational database

OLAP (On-line Analytical Processing):is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations.------> Data Warehouse

OLAP vs. OLTP

O L T P O L A P

u sers c le rk , IT p ro fe ss io n a l k n o w le d g e w o rk e r

fu n ctio n d ay to d ay o p e ra tio n s d ec is io n su p p o rt

D B d esig n a p p lic a tio n -o rien te d su b jec t-o rien te d

d a ta c u rren t, u p -to -d a te d e ta ile d , fla t re la tio n a l iso la ted

h is to ric a l, su m m arize d , m u ltid im e n sio n a l in teg ra ted , co n so lid a ted

u sa g e re p e titiv e a d -h o c

a ccess re ad /w rite in d e x /h a sh o n p rim . k ey

lo ts o f sca n s

u n it o f w o rk sh o rt, s im p le tran sac tio n c o m p lex q u e ry

# reco rd s a ccessed te n s m illio n s

# u sers th o u sa n d s h u n d red s

D B size 1 0 0 M B -G B 1 0 0 G B -T B

m etr ic tra n sac tio n th ro u g h p u t q u e ry th ro u g h p u t, re sp o n se

What is a Data Warehouse

The repository for the DSS is the DATA WAREHOUSE

Definition: Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for decision making.

Integrated

The data warehouse is a centralized, consolidated database that integrated data derived from the entire organization

Multiple Sources Diverse Sources Diverse Formats

Subject-Oriented

Data is arranged and optimized to provide answer to questions from diverse functional areas

Data is organized and summarized by topic Sales / Marketing / Finance / Distribution /

Etc.

Time-Variant

The Data Warehouse represents the flow of data through time

Can contain projected data from statistical models

Data is periodically uploaded then time-dependent data is recomputed

Nonvolatile

Once data is entered it is NEVER removed Represents the company’s entire history

Near term history is continually added to it Always growing Must support terabyte databases and

multiprocessors

Read-Only database for data analysis and query processing

Dimensional Modeling Dimension

dimension is a data element that categorizes each item in a data set into non-overlapping regions

Facts a value or measurement, which represents a fact about the

managed entity or system.

typically numeric values that can be aggregated

Dimensional Modeling Database is a set of facts (points) in a

multidimensional space Fact tables

contains business facts or measures and foreign keys which refer to primary keys in the dimension tables

Dimension tables Each dimension table has a set of attributes

e.g., Day, Month, Year of Date

Attributes of a dimension may be related by partial order

Hierarchy: e.g., Day > Month > Year

Example of Star Schema

Example of Snowflake Schema

ETL

ETL

Extraction Transformation Loading – ETL To get data out of the source and load it into the data

warehouse – simply a process of copying data from one database to other

Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database

Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems, and spreadsheets; such data also requires extraction, transformation, and loading

When defining ETL for a data warehouse, it is important to think of ETL as a process, not a physical implementation

ETL ETL is often a complex combination of process and

technology that consumes a significant portion of the data warehouse development efforts and requires the skills of business analysts, database designers, and application developers

It is not a one time event as new data is added to the Data Warehouse periodically – monthly, daily, hourly

Because ETL is an integral, ongoing, and recurring part of a data warehouse

Automated Well documented Easily changeable

ETL Staging Database

ETL operations should be performed on a relational database server separate from the source databases and the data warehouse database

Creates a logical and physical separation between the source systems and the data warehouse

Minimizes the impact of the intense periodic ETL activity on source and data warehouse databases

top related