chapter 8 newer database topics based on g. post, dbms: designing & building business...
TRANSCRIPT
Chapter 8
Newer Database Topics
Based on G. Post, DBMS: Designing & Building Business Applications
University of ManitobaAsper School of Business
3500 DBMSBob Travica
Updated 2010
DDBB
SSYYSSTTEEMMSS
2 of 20
OLAP & Data Warehouse
Online TransactionProcessing (OLTP):Querying Databaseswith 3NF tables
Operations’data
Predefinedreports
Online Analytical Processing (OLAP);Data warehousing;Data Mining.Usually denormalized data.
Periodicaltransfers
Interactivedata analysis
Flat files
MIS 3500
DDBB
SSYYSSTTEEMMSS
4 of 20
Warehousing Goals
Integrate data from different sources to get a larger picture of
business Data aggregations (summaries on different dimensions) Ad hoc queries (support non-routine decision making) Statistical analysis (test hypotheses on relationships between
pieces of data) Discover new relationships (data mining)
DDBB
SSYYSSTTEEMMSS
5 of 20
Extraction, Transformation, and Transportation
Data warehouse:All data must be consistent.
Customers
Convert “Client” to “Customer”
Apply standard product numbers
Convert currencies
Fix region codes
Transaction data from diverse systems.
• Preparations performed on data
Extract
Transform Transport
DDBB
SSYYSSTTEEMMSS
6 of 20
Three-Dimensional View of Data: Cube
Sale Date
CustomerLocation
Categ
ory
Similar ideas usedin crosstab query andpivot table.
DDBB
SSYYSSTTEEMMSS
7 of 20
Data Hierarchy
Year
Quarter
Month
Week
Day
Levels Roll-upTo get higher-level totals
Drill-downTo get lower-level details
DDBB
SSYYSSTTEEMMSS
8 of 20
Star Design
Amount=SalePrice*Quantity
Fact Table
SaleSaleDateSalePriceQuantity
Dimension Table
Measures
Amounts broken down by product category, period, and customer location.
ProductCategory
CustomerLocation
Dimension Table
Dimension Table
Hierarchical: Dimension tables can link only via fact table.
DDBB
SSYYSSTTEEMMSS
9 of 20
Snowflake Design
SaleIDItemIDQuantitySalePriceAmount
OLAPItems
ItemIDDescriptionQuantityOnHandListPriceCategory
Merchandise
SaleIDSaleDateEmployeeIDCustomerIDSalesTax
Sale
CustomerIDPhoneFirstNameLastNameAddressZipCodeCityID
Customer
CityIDZipCodeCityState
City
Network-like design: Dimension tables can link
directly.
DDBB
SSYYSSTTEEMMSS
10 of 20
Excel Pivot Table Reports
Can place data in rows or columns.By grouping months, can instantly get quarterly or monthly totals.
Quarter MonthQuarter 1 Quarter 2 Quarter 3 Quarter 4 Grand Total
LastName EmployeeIDDataCarpenter 8 Sum of Animal 1,668.91 606.97 426.39 7.20 2,709.47
Sum of Merchandise 324.90 78.30 99.00 128.70 630.90Eaton 6 Sum of Animal 522.37 341.85 562.50 1,426.72
Sum of Merchandise 30.60 54.90 107.10 192.60Farris 7 Sum of Animal 5,043.36 1,059.70 796.47 6,899.53
Sum of Merchandise 826.92 188.10 306.00 1,321.02Gibson 2 Sum of Animal 4,983.51 1,549.83 2,556.10 9,089.44
Sum of Merchandise 668.25 238.50 450.90 1,357.65Hopkins 4 Sum of Animal 3,747.96 1,194.88 372.65 128.41 5,443.90
Sum of Merchandise 476.91 252.90 121.50 7.20 858.51James 5 Sum of Animal 3,282.77 2,373.08 437.88 150.11 6,243.84
Sum of Merchandise 505.89 693.45 99.00 99.00 1,397.34O'Connor 9 Sum of Animal 2,643.69 180.91 510.12 3,334.72
Sum of Merchandise 263.70 83.70 55.80 403.20Reasoner 3 Sum of Animal 4,577.43 625.74 589.68 2,500.24 8,293.09
Sum of Merchandise 762.30 89.10 116.80 396.90 1,365.10Reeves 1 Sum of Animal 1,120.93 1,120.93
Sum of Merchandise 263.88 263.88Shields 10 Sum of Animal 1,008.76 162.15 1,170.91
Sum of Merchandise 62.10 22.50 84.60Total Sum of Animal 28,599.69 7,591.11 2,840.72 6,701.03 45,732.55Total Sum of Merchandise 4,185.45 1,624.05 569.50 1,495.80 7,874.80
DDBB
SSYYSSTTEEMMSS
11 of 20
CUBE Option (SQL 99)
Bird 1 135.00 0 0Bird 2 45.00 0 0…Bird (null) 32.00 0 0Bird (null) 607.50 1 0Cat 1 396.00 0 0Cat 2 113.85 0 0…Cat (null) 1293.30 1 0(null) 1 1358.8 0 1(null) 2 1508.94 0 1(null) 3 2362.68 0 1…(null) (null) 8451.79 1 1
Category Month Amount Gc Gm
SELECT Category, Month, Sum, GROUPING (Category) AS Gc, GROUPING (Month) AS Gm
FROM …GROUP BY CUBE (Category, Month...)
DDBB
SSYYSSTTEEMMSS
12 of 20
GROUPING SETS: Hiding Details
Bird (null) 607.50Cat (null) 1293.30…(null) 1 1358.8(null) 2 1508.94(null) 3 2362.68…(null) (null) 8451.79
Category Month Amount
SELECT Category, Month, SumFROM …GROUP BY GROUPING SETS ( ROLLUP (Category),
ROLLUP (Month),( ) )
DDBB
SSYYSSTTEEMMSS
13 of 20
SQL RANK FunctionsSELECT Employee, SalesValue RANK() OVER (ORDER BY SalesValue DESC) AS rankDENSE_RANK() OVER (ORDER BY SalesValue DESC) AS denseFROM SalesORDER BY SalesValue DESC, Employee;
Employee SalesValue rank dense
Jones 18,000 1 1
Smith 16,000 2 2
Black 16,000 2 2
White 14,000 4 3
DENSE_RANK does not skip numbers
• Therefore, advances in SQL motivate DBMS vendors to support OLAP and data warehousing.
DDBB
SSYYSSTTEEMMSS
14 of 20
Data Mining
Goal: To discover unknown relationships in the data that can be used to make better decisions. Exploratory analysis. A bottom-up approach that scans the data to find relationships Some statistical routines, but they are not sufficient
Statistics relies on averages
Sometimes the important data lies in more detailed pairs
Supervised by developer vs. unsupervised (self-organizing artificial neural networks)
DDBB
SSYYSSTTEEMMSS
15 of 20
Common Techniques
1. Classification/Prediction
2. Association Rules/Market Basket Analysis
3. Clustering
DDBB
SSYYSSTTEEMMSS
16 of 20
1. Classification(Prediction)
Purpose: “Classify” things that are causes and those that are effects.
Examples Which borrowers/loans are most likely to be successful?
Which customers are most likely to want a new item?
Which companies are likely to file bankruptcy?
Which workers are likely to quit in the next six months?
Which startup companies are likely to succeed?
Which tax returns are fraudulent?
DDBB
SSYYSSTTEEMMSS
17 of 20
Classification Process
Clearly identify the outcome/dependent variable. Identify potential variables that might affect the outcome. Use sample data to test and validate the model. Regression/correlation analysis, decision tables and trees,
etc.
Income Credit History Job Stability Credit Success
50000 Good Good Yes
75000 Mixed Bad No
DDBB
SSYYSSTTEEMMSS
18 of 20
2. Association/Market Basket
Purpose: Determine what events or items go together/co-occur.
Examples: What items are customers likely to buy together?
(Business use: Consider putting the two together to
increase cross-selling.)
DDBB
SSYYSSTTEEMMSS
19 of 20
Association Challenges
If an item is rarely purchased, any other item bought with it seems important. So combine items into categories.
Some relationships are obvious. Burger and fries.
Some relationships are puzzling/meaningless. Hardware store found that toilet rings sell well only when a new
store first opened. But what does it mean?
DDBB
SSYYSSTTEEMMSS
20 of 20
3. Cluster Analysis Purpose: Determine groups of people or some entities. Examples
Are there groups of customers? (If so, we could target them; market segmentation)
Do the locations for our stores have elements in common? (If so, we can search for similar clusters for new locations.)
Do employees have common characteristics? (If so, we can hire similar, or dissimilar, people.)
Small intracluster distance
Large intercluster distance