professional issues in computing...nuces, islamabad campus data warehousing - fall 2012 12 nuces,...

34
Data Warehousing (The Need, Importance & the Big Picture) Naveed Iqbal, Assistant Professor NUCES, Islamabad Campus (Lecture Slides Week # 1)

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

Data Warehousing(The Need, Importance & the Big Picture)

Naveed Iqbal, Assistant Professor

NUCES, Islamabad Campus(Lecture Slides Week # 1)

Page 2: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 2

Why this Course?

The World is changing / (in fact changed)

Either change or Be left behind.

Missing the opportunities or going in thewrong direction has prevented us fromgrowing.

What is the right direction?

Harnessing the data, in the knowledgedriven economy.

Doing what can’t be or difficult to automate.

Page 3: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 3

The Need of the Time

Drowning in data AND/BUT starving for

information.

Knowledge is power BUT Intelligence is

absolute/super power.

Page 4: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 4

The Need of the Time

Data

Information

Knowledge

Intelligence

POWER

($/£)

Page 5: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

Evolution of Information Systems

NUCES, Islamabad Campus Data Warehousing - Fall 2012 5

Page 6: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 6

Page 7: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 7

Page 8: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

Business Intelligence

NUCES, Islamabad Campus Data Warehousing - Fall 2012 8

Page 9: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 9

Page 10: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

Visualization

NUCES, Islamabad Campus Data Warehousing - Fall 2012 10

Page 11: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 11

Date Warehousing – the big picture

Data Warehouse Server

(Tier 1)

Data

Warehouse

Operational

Data Bases

Semistructured

Sources Query/Reporting

Data Marts

MOLAP

ROLAP

Clients

(Tier 3)

Tools

Meta

Data

Data sources

Data

(Tier 0)

IT

Users

Business

Users

Business Users

Data Mining

Archived

data

Analysis

OLAP Servers

(Tier 2)

Extract

Transform

Load

(ETL)

www data

Page 12: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 12

Page 13: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 13

Approach of the Course

Develop an understanding of the underlying RDBMSconcepts.

Apply these concepts to VLDB / DSS environmentsand understand where and why they break down?

Expose the differences between RDBMS and DataWarehouse in the context of VLDB.

Provide the basics of DSS tools such as OLAP, DataMining and demonstrate their applications.

Demonstrate the application of DSS concepts andlimitations of the OLTP concepts through labexercises.

Page 14: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 14

Summary of the Course

Introduction & Background

Extract-Transform-Load (ETL)

Normalization & De-Normalization

Dimensional Modeling

Online Analytical Processing (OLAP)

Data Quality Management (DQM)

Need for Speed (Parallelism, Join and Indexing Techniques)

DWH Implementation Steps

Complete Implementation Case Study

Lab and Tool Usage

Page 15: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 15

Books

Reference Books Golfarelli & Rizzi, Data Warehouse Design – Modern

Principles and Methodoligies, McGRAW-Hill

W. H. Inmon, Building the Data Warehouse,

John Wiley & Sons Inc., NY

R. Kimball, The Data Warehouse Toolkit,

John Wiley & Sons Inc., NY

A. Abdullah, “Data Warehousing for Beginners: Concepts

& Issues”.

Paulraj Ponniah, Data Warehousing Fundamentals, John

Wiley & Sons Inc., NY

. . .

Page 16: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 16

Course Execution Plan

Lecturing / Discussions

Lab Work + Tutorials

Assignments / Case Studies

Projects

Marks Breakup:

Mid-I: 12% Quizzes: 6%

Mid-II: 13% Assignments/Case Study: 9%

Final*: 40% Projects*: 20%

* Mandatory (Missing means F)

Page 17: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 17

Code of Conduct

Regularity Attendance criteria as per university policy

Punctuality No entry after 5 minutes from class start time (N/A for habitual late

comers)

Discipline ABSOLUTLY NO COMPROMISE

Positive Attitude

High Level of Class Participation

No Plagiarism, Cheating …

No Change in Deadlines

No Usage of Mobile / Other Devices

Page 18: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 18

Scenario 1

ABC Pvt Ltd is a company with branches at

Karachi, Quetta, Peshawar and Lahore. The Sales

Manager wants quarterly sales report. Each

branch has a separate operational system.

Page 19: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 19

Scenario 1 : ABC Pvt Ltd.

Karachi

Quetta

Peshawar

Lahore

Sales

ManagerSales per item type per branch

for first quarter.

Page 20: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 20

Solution 1:ABC Pvt Ltd.

Extract sales information from each database.

Store the information in a common repository

at a single site.

Page 21: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 21

Solution 1:ABC Pvt Ltd.

Karachi

Quetta

Peshawar

Lahore

Data

Warehouse

Sales

Manager

Query &

Analysis tools

Report

Page 22: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 22

Scenario 2

One Stop Shopping Super Market has huge

operational database. Whenever Executives wants

some report, the OLTP system becomes slow and

data entry operators have to wait for some time.

Page 23: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 23

Scenario 2 : One Stop Shopping

Operational

Database

Data Entry Operator

Data Entry Operator

ManagementWait

Report

Page 24: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 24

Solution 2

Extract data needed for analysis fromoperational database.

Store it in warehouse.

Refresh warehouse at regular interval so that itcontains up to date information for analysis.

Warehouse will contain data with historicalperspective.

Page 25: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 25

Solution 2

Operational

database

Data

Warehouse

Extract

data

Data Entry

Operator

Data Entry

Operator

Manager

Report

Transaction

Page 26: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 26

Scenario 3

Cakes & Cookies is a small, new company. President

of the company wants his company should grow. He

needs information so that he can make correct

decisions.

Page 27: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 27

Solution 3

Improve the quality of data before

loading it into the warehouse.

Perform data cleaning and

transformation before loading the data.

Use query analysis tools to support

adhoc queries.

Page 28: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 28

Solution 3

Query and Analysis

toolPresident

Expansio

n

Improvemen

t

sales

time

Data

Warehouse

Page 29: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 29

Case Study

AFCO Foods & Beverages is a new companywhich produces dairy, bread and meatproducts with production unit located atGujranwala.

There products are sold in all the region ofPakistan.

They have sales units at provincial HeadQuarters.

The President of the company wants salesinformation.

Page 30: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 30

Sales Information

Report: The number of units sold.

113

Report: The number of units sold over time

January February March April

14 41 33 25

Page 31: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 31

Sales Information

Report : The number of items sold for each product with

time

Jan Feb Mar Apr

Wheat Bread 6 17

Cheese 6 16 6 8

Swiss Rolls 8 25 21

Product

Page 32: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 32

Sales Information

Report: The number of items sold in each City for each

product with time

Jan Feb Mar Apr

Karachi Wheat

Bread

3 10

Cheese 3 16 6

Swiss Rolls 4 16 6

Lahore Wheat

Bread

3 7

Cheese 3 8

Swiss Rolls 4 9 15

Product

Tim

e

Page 33: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 33

Sales Information

Report: The number of items sold and income in each region for

each product with time.

Jan Feb Mar Apr

Rs U Rs U Rs U Rs U

Karachi Wheat Bread 7.44 3 24.80 10

Cheese 7.95 3 42.40 16 15.90 6

Swiss Rolls 7.32 4 29.98 16 10.98 6

Lahore Wheat Bread 7.44 3 17.36 7

Cheese 7.95 3 21.20 8

Swiss Rolls 7.32 4 16.47 9 27.45 15

Page 34: Professional Issues in Computing...NUCES, Islamabad Campus Data Warehousing - Fall 2012 12 NUCES, Islamabad Campus Data Warehousing - Fall 2012 13 Approach of the Course Develop an

NUCES, Islamabad Campus Data Warehousing - Fall 2012 34

Data Warehousing includes

Building Data Warehouse

Online Analysis/Analytical Processing (OLAP)

Presentation

RDBMS

Flat File

Presentation

Cleaning ,Selection &

Integration

Warehouse & OLAP serverClient