databases and decision support systems css263 lecture 14
Post on 19-Dec-2015
222 views
TRANSCRIPT
![Page 1: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/1.jpg)
Databases and Decision Support Systems
CSS263 Lecture 14
![Page 2: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/2.jpg)
What is a Decision Support System?
What is a Data Warehouse?
Different uses for a Data Warehouse
Problems of Data Warehousing
What is OLAP?
What is Data Mining?
Data Mining Operations
Data Mining Pit Falls!
LECTURE PLAN
![Page 3: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/3.jpg)
COMPARISON OF OLTP AND DSS
ON-LINE TRANSACTION PROCESSING
Updates to operational data
Stores detailed data
Repetitive processing
Predictable pattern of usage
Transaction driven
Application oriented
Supports day-to-day decisions
Usually small changes
Generally a large number of transactions
Serves many operational users
![Page 4: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/4.jpg)
COMPARISON OF OLTP AND DSS
DECISION SUPPORT SYSTEMS
Analysis of historical data
Ad-hoc fairly complex (read-only) queries
Stores detailed, lightly, and highly summarised data
Low to medium level of transaction throughput
Analysis driven
Subject oriented
Supports strategic decisions
Serves a few ‘managerial’ users
Fast-response time required
![Page 5: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/5.jpg)
DECISION SUPPORT SYSTEMS
SEEING DATA AS INFORMATION
• Operational Information
Users of data at the operational level of the business are concerned with data at its highest level of detail, e.g. particular accounts, invoices, delivery dates, etc…
• Tactical Information
Users of data at the tactical level are more interested in aggregated historical data to assist in planning decisions.
• Strategic Information
Users of data at the strategic level are concerned with using highly summarised data to give an overview of operations.
![Page 6: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/6.jpg)
DATA WAREHOUSING
![Page 7: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/7.jpg)
DATA WAREHOUSINGWHAT IS A DATA WAREHOUSE?
DEFINITION :
SUBJECT-ORIENTED:
The warehouse is organized around the major subjects of an enterprise (e.g. customers, products, and sales) rather than the major application areas (e.g. customer invoicing, stock control, and order processing).
‘A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process’ [Inmon, 1993].
![Page 8: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/8.jpg)
DATA WAREHOUSINGWHAT IS A DATA WAREHOUSE?
DEFINITION :‘A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process’ [Inmon, 1993].
INTEGRATED DATA:
The data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent. Such data, must be made consistent to present a unified view of the data to the users.
![Page 9: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/9.jpg)
DATA WAREHOUSINGWHAT IS A DATA WAREHOUSE?
DEFINITION :‘A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process’ [Inmon, 1993].
TIME VARIANT:
Data in the warehouse is only accurate and valid at some point in time or over some time interval. Time-variance is also shown in the extended time that the data is held, the association of time with all data, and the fact that data represents a series of historical snapshots.
![Page 10: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/10.jpg)
DATA WAREHOUSINGWHAT IS A DATA WAREHOUSE?
DEFINITION :‘A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process’ [Inmon, 1993].
NON-VOLITILE:
Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. New data is always added as a supplement to the database, rather than a replacement.
![Page 11: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/11.jpg)
DATA WAREHOUSINGTHE USE OF A DATA WAREHOUSE
INVENTORYDATABASE
PERSONNELDATABASE
NEWCASTLESALES DB
LONDONSALES DB
GLASGOWSALES DB
STEP 2: Question the Data Warehouse
DECISIONS and ACTIONS!
STEP 3: Do something with what you learn from the Data Warehouse
STEP 1: Load the Data Warehouse
DATAWAREHOUSE
![Page 12: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/12.jpg)
• Data Collection
There need to be extraction routines to gather data from the various operational data sources that interface with the Data Warehouse.
DATA WAREHOUSINGCREATING A DATA WAREHOUSE
• Data Cleaning & Transformation
Data must be checked for validity and accuracy, and differences in syntax and semantics must be resolved.
![Page 13: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/13.jpg)
• Data Loading
Data must be loaded into the Data Warehouse after carrying out appropriate summarisation and aggregation. Often this will be done using parallelism (as it could take weeks to serially load a terabyte of data!).
DATA WAREHOUSINGCREATING A DATA WAREHOUSE
• Data Refresh
Updates to base data (operational data) must periodically be propagated to the Data Warehouse.
![Page 14: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/14.jpg)
• Data Storage
Appropriate storage structures must exist to allow the Data Warehouse to support fast access for search and analysis of differing data types (text, graphic, picture, …).
DATA WAREHOUSINGCREATING A DATA WAREHOUSE
![Page 15: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/15.jpg)
DATA WAREHOUSINGARCHITECTURE OF A DATA WAREHOUSE
Operationaldata source
1
Operationaldata source
2
Operationaldata source
n
Archive/Backup
data
MetaData
DetailedData
HighlySummarized
Data
LightlySummarized
Data
Reporting query,A/P development
and EIS tools
OLAP tools
Data mining tools
DBMS
Warehouse Manager
Warehouse Manager
![Page 16: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/16.jpg)
DATA WAREHOUSINGDATA WAREHOUSE INFORMATION FLOWS
• INFLOW - Processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse.
• UPFLOW - Processes associated with adding value to the data in the warehouse through summarizing, packaging, and distribution of the data.
• DOWNFLOW - Processes associated with archiving and backing-up/recovery of data in the warehouse.
• OUTFLOW - Processes associated with making the data available to the end-users.
• METAFLOW - Processes associated with the management of the metadata.
![Page 17: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/17.jpg)
DATA WAREHOUSINGDATA FLOWS IN A DATA WAREHOUSE
![Page 18: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/18.jpg)
DATA WAREHOUSINGDATA WAREHOUSE DBMS REQUIREMENTS
• Load performance• Load processing• Data quality management• Query performance• Terabyte scalability• Mass user scalability• Networked data warehouse• Warehouse administration • Integrated dimensional analysis• Advanced query functionality
![Page 19: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/19.jpg)
DATA WAREHOUSINGPROBLEMS
• Underestimation of resources for data loading• Hidden problems with source systems• Required data not captured• Increased end-user demands• Data homogenization• High demand for resources• Data ownership• High maintenance• Long duration projects• Complexity of integration
![Page 20: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/20.jpg)
ON-LINE ANALYTICAL PROCESSING (OLAP)
![Page 21: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/21.jpg)
OLAPWHAT IS OLAP?
DEFINITION :
‘OLAP applications and tools are those that are designed to ask ad hoc, complex queries of large multidimensional collections of data. It is for this reason that OLAP is often mentioned in the context of Data Warehouses’.
![Page 22: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/22.jpg)
OLAPTYPICAL OLAP QUESTIONS
• Which type of property sells for prices above the average selling price for properties in the main cities of Great Britain and how does this correlate to demographic data?
• What are the three most popular areas in each city for renting property in 1997 and how does this compare with the figures for the previous two years?
• What is the current monthly revenue for property sales at each branch office, compared with rolling 12-monthly prior figures?
• What is the relationship between the total annual revenue generated by each branch office and the total number of sales staff assigned to each branch office?
![Page 23: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/23.jpg)
OLAPCODD’S RULES
• Multi-dimensional conceptual view• Transparency• Accessibility• Consistent reporting performance• Client-server architecture• Generic dimensionality• Dynamic sparse matrix handling• Multi-user support• Unrestricted cross-dimensional operations• Intuitive data manipulation• Flexible reporting• Unlimited dimensions and aggregation levels
![Page 24: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/24.jpg)
10 50 10 10
0 0 1 2
80 80 80 80
0 25 20 15
0 0 0 0
London GlasgowNewcastle
Socks
Jumpers
T-Shirts
Shorts
Pyjamas
Spring Summer Autumn Winter
OLAPMULTDIMENSIONAL DATA MODEL
Example: Three dimensions – Product, Sales Area, and Season
![Page 25: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/25.jpg)
OLAPTYPICAL OLAP OPERATIONS
Total SalesTotal Sales per cityTotal Sales per city per storeTotal Sales per city per store per month
DrillDown
DrillUp
Total SalesTotal Sales per cityTotal Sales per city by category
DrillDown
DrillUp
Drill Across
![Page 26: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/26.jpg)
OLAPTYPICAL ARCHITECTURE FOR MOLAP TOOLS
![Page 27: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/27.jpg)
OLAPTYPICAL ARCHITECTURE FOR ROLAP TOOLS
![Page 28: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/28.jpg)
OLAPRELATIONAL STAR SCHEMA
timeid date week month quarter year …...
TIMES
SALES Fact Table
Dimension Table
id pname cat desc price …. locid city state country …..
PRODUCTS LOCATIONS
Dimension TableDimension Table
id timeid locid amount cost
![Page 29: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/29.jpg)
DATA MINING
![Page 30: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/30.jpg)
DEFINITION :
‘A set of techniques used in an automated approach to exhaustively explore and bring to the surface complex relationships in very large datasets’
[DBMS DATA WAREHOUSE SUPPLEMENT – AUG 1996]
DATA MININGWHAT IS DATA MINING
![Page 31: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/31.jpg)
DATA MININGWHAT IS DATA MINING
SHORT DEFINITION :
‘Spot hidden gold in large collections of data’
IMPORTANT – Data Mining tools extract NEW information from data, this information is then used to guide business decisions about future activities
![Page 32: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/32.jpg)
DATA MININGDATA MINING APPLICATIONS
RETAIL/MARKETING:• Identifying buying patterns of customers• Finding associations among customers demographic characteristics• Predicting response to mailing campaigns• Market basket analysis
BANKING:• Detecting patterns of fraudulent credit card use• Identifying loyal customers• Predicting customers likely to change their credit card affiliations• Determining credit card spending by customer groups
![Page 33: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/33.jpg)
DATA MININGDATA MINING APPLICATIONS
INSURANCE:• Claims analysis• Predicting which customers will buy new policies
MEDICINE:• Characterising patient behaviour to predict surgery visits• Identifying successful medical therapies for different illnesses
![Page 34: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/34.jpg)
DATA MININGDATA MINING QUESTIONS
DISCOVERY-ORIENTED (LINK ANALYSIS):
“What are the factors that determine sales of Product X”
PREDICTIVE MODELLING:
“How much profit will this customer generate?”
“Where is the best place to build a new road?”
![Page 35: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/35.jpg)
DATA MININGDATA MINING OPERATIONS
ASSOCIATION RULES
Descriptive model that discovers rules that relate separate classes of data items together. For example, ‘people who buy beer also buy crisps 50% of the time’.
SEQUENCING RULES
Descriptive model that discovers sequence correlations in time-sequenced data. For example, ‘People who have purchased a VCR are 300% more likely to purchase a camcorder in the time period 2-4 months after the VCR was purchased’
DESCRIPTIVE OPERATIONS
![Page 36: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/36.jpg)
DATA MININGDATA MINING OPERATIONS
CLASSIFICATION
Predict class membership. For example, income within one of three categorical values: ‘Low’, ‘Middle’, or ‘High’.
REGRESSION
Predict a specific value. For example, income will be a certain amount.
PREDICTIVE OPERATIONS
![Page 37: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/37.jpg)
DATA MININGCLASSIFICATION AND REGRESSION
This is the largest area where data mining is currently applied!
All techniques generate a predictive model based on historical data. The model then predicts the outcome of new cases. This is known as ‘Data Training’.
The data necessary to build a predictive model therefore has to be composed of cases where the outcome is known and included.
![Page 38: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/38.jpg)
DATA MININGCLASSIFICATION AND REGRESSION
EXAMPLE:
‘It may be found that if a Bank’s customer is aged between 18 and 24, and their average account balance is between £0.00 and £200.00, then they are highly likely to default on a loan.’
This rule will then be applied to predict whether it would be wise to authorise a bank loan, for a particular customer.
![Page 39: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/39.jpg)
DATA MINING TECHNIQUES
![Page 40: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/40.jpg)
DATA MININGCLASSIFICATION AND REGRESSION
DECISION TREES
![Page 41: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/41.jpg)
DATA MININGCLASSIFICATION AND REGRESSION
NEURAL NETWORKS
![Page 42: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/42.jpg)
DATA MININGCLASSIFICATION
NAÏVE-BAYES
This technique limits its inputs to categorical data, and it is applicable only to classification. Simplicity and speed make this an ideal exploratory tool.
The technique is based on a simple concept; conditional probabilities derived from observed frequencies in the training data.
![Page 43: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/43.jpg)
DATA MININGCLASSIFICATION
NAÏVE-BAYES - EXAMPLE
Try to predict customer turnover based on the following facts:
75% of customers who had monthly bills of between £300 and £400 have left. 68% of customers who had made more than four calls to customer service have left.
This technique will predict that a customer who has an average monthly bill of £380, and who has made three calls to customer services has a high likelihood of leaving soon.
Therefore, they should be contacted and offered a discount!
![Page 44: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/44.jpg)
DATA MINING PIT-FALLS!
![Page 45: Databases and Decision Support Systems CSS263 Lecture 14](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d375503460f94a108a5/html5/thumbnails/45.jpg)
DATA MININGCORRELATIONS AND CAUSALITY
Data mining tools find correlations, not causes, and the rules and predictions that come out of data mining tools are based on correlation only.
EXAMPLE:
Rule: "Customers who purchase pasta are three times more likely to purchase cheese than customers who don’t buy pasta"
Therefore:Does buying pasta cause people to buy cheese?Does buying cheese cause people to buy pasta?
Or is it the sudden popularity of a book called ‘You Can Lose Five Pounds a Week Eating Pasta With Cheese!’ ?