![Page 1: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/1.jpg)
1
Introduction to Data Warehouse and Data Mining
MIS 2502Data Analytics
![Page 2: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/2.jpg)
2
Importance of data
What organizations do with data? Transaction Processing
– (E)commerce: Amazon.com; PNC Bank– B2B systems: Supply Chain Management– Web Search
Decision Making– Financial reporting– Inventory management– Budget allocation– Customer Relationship Management– Target Marketing– Product Design and Promotions– Fraud Detection
![Page 3: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/3.jpg)
3
Unnormalized data set
Patient ID
Name Address DOB Doc ApptDate
Location DX
111111 Cindy Marselis
2320 Edge Hill Road
1/11/64 Armstrong 9/1/09 11:00 AM
Alter 2011 Herniated DiscFlu
111111 Cindy Marselis
9331 Rising Sun Avenue
1/11/64 Morningstar 9/1/09 11:00 AM
Alter 2011 Herniated Disc
111111 Cindy Marselis
2320 Edge Hill Road
1/11/64 Allen 11/1/09 10:00 AM
Alter 2012 Psoriasis
222222 Kathryn Marselis
2320 Edge Hill Road
11/3/04 Dershaw 8/1/09 11:00 AM
Speakman 105
Well baby check
111111 Cindy Schwartz
9331 Rising Sun Avenue
1/11/64 Armstrong 8/11/09 3:00 PM
Alter 105 PsoriasisHerniated Disc
![Page 4: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/4.jpg)
4
Normalized db - before
![Page 5: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/5.jpg)
5
Normalized db - after
![Page 6: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/6.jpg)
6
Decision Making with Databases
Databases are used for transaction processing
Data from transaction processing is used for tactical decision making– Database provides basic reporting function
But…
![Page 7: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/7.jpg)
7
Different managers require different data and data may come from other part of the organization or outside the organization
External and internal forces require tactical and strategic decisions
Search for competitive advantage Business environments are dynamic Decision-making cycle time is reduced
The Need for Data Analysis
![Page 8: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/8.jpg)
8
Some Questions Analysts Need to Answers
Sales analysis:What are the sales by quarter and geography?How do sales compare in two different stores in the same state?Profitability analysis:Which is the most profitable store in the state CA? Which product lines are the highest revenue producers this year?Which products and product lines are the most profitable this quarter?Sale force analysisWhich salesperson is the best revenue producer this year?Do salesperson X meet his sale target this quarter?
![Page 9: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/9.jpg)
9
From transaction processing to supporting decision making
![Page 10: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/10.jpg)
10
Operational data – Relational, normalized database – Optimized to support transactions – Real time updates
DSS – Snapshot of operational data– Summarized – Large amounts of data
Data analyst viewpoint– Timespan– Granularity– Dimensionality
Operational vs. Decision Support Data
![Page 11: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/11.jpg)
11
Integrated– Centralized– Holds data retrieved from entire organization
Subject-Oriented – Optimized to give answers to diverse questions– Used by all functional areas
Time Variant – Flow of data through time– Projected data
Non-Volatile – Data never removed– Always growing
Data Warehouse
![Page 12: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/12.jpg)
12
Data Warehouse
Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse
Data mart – contains a subset of data warehouse information
![Page 13: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/13.jpg)
13
ETL – Extraction, Transformation, Load
Transform: cleanse data for consistency and output exceptionso Apply business ruleso Selecting certain columns to load
(not null records)o Translating coded values (1, M, male
= 0 )o Derive new calculated value
(sale_amount = qty * unit_price)o Join data from multiple sources
(lookup, merge)o Aggregate (rollup/summarize data)o Transpose/pivot (turning columns
into rows)o Data validation.
Extract
data from source systems
Load:
data into repository
![Page 14: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/14.jpg)
14
Data in a Data Warehouse
Data for a data warehouse is obtained from a variety of databases
– E.g. customer database, transaction database, accounts database
Data in data warehouse is multidimensional
![Page 15: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/15.jpg)
15
Multidimensional Analysis
Cube – common term for the representation of multidimensional information
![Page 16: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/16.jpg)
16
Data-modeling technique Maps multidimensional decision support into relational
database Yield model for multidimensional data analysis while
preserving relational structure of operational DB Four Components:
– Facts– Dimensions– Attributes– Attribute hierarchies
Star Schema
![Page 17: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/17.jpg)
17
Simple Star Schema
Figure 13.12
![Page 18: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/18.jpg)
18
Slice and Dice View of Sales
Figure 13.14
![Page 19: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/19.jpg)
19
Facts and dimensions represented by physical tables in data warehouse DB
Fact table related to each dimension table (M:1) Fact and dimension tables related by foreign keys Subject to the primary/foreign key constraints
Star Schema Representation
![Page 20: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/20.jpg)
20
Star Schema for Sales
Sales fact table and its four dimensions: location, time, product, and customer .Allows sales to be aggregated by time, geographic location, product, and by customer.
![Page 21: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/21.jpg)
21
Data Warehouse to Data Marts
Given the large size of a data warehouse, organizations create data marts
– Subject oriented data
– Subset of data in a data warehouse
– Used for focused decision-making
![Page 22: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/22.jpg)
22
Advanced data analysis environment Supports decision making, business modeling, and
operations research activities Characteristics of OLAP
– Use multidimensional data analysis techniques– Provide advanced database support– Provide easy-to-use end-user interfaces– Support client/server architecture
Online Analytical Processing (OLAP)
![Page 23: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/23.jpg)
23
OLAP Client/Server Architecture
Figure 13.6
![Page 24: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/24.jpg)
24
Seeks to discover patterns or relationships within the data
Data mining tools automatically search data for patterns and relationships
Data mining tools– Analyze data– Uncover problems or opportunities– Form computer models based on findings– Predict business behavior with models– Require minimal end-user intervention
Data Mining
![Page 25: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/25.jpg)
25
What Are Data-Mining Tools?
![Page 26: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/26.jpg)
26
Data Mining Process
![Page 27: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/27.jpg)
27AB113 - Information Technology
Business Intelligence
![Page 28: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/28.jpg)
28
MS SQL 2008 Architecture
Dimensional Model/Star SchemaRelational Model
![Page 29: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/29.jpg)
29
Back Room—Data prepared from many sources Front Room—Information presented
![Page 30: 1 Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d2d5503460f94a04ac3/html5/thumbnails/30.jpg)
30
Multidimensional Analysis and Data Mining
Differences between databases and data warehouse/data mart?
Data mining – the process of analyzing data to extract information not offered by the raw data alone– To perform data mining users need data-mining tools
» Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making
Business intelligence – taking data from multiple sources and turn it into useful and easy to understand information to support decision-making efforts for various kinds of people.