data mining – techniques and applications
DESCRIPTION
TRANSCRIPT
04/10/23 1
DATA MINING –TECHNIQUES AND APPLICATIONS
Charlie Chough
CS157B
Spring 2006
Charlie Chough- CS157B 204/10/23
TOPICS
What is Data Mining? How does Data Mining work? What are the applications for Data
Mining? What are the issues surrounding Data
Mining?
Charlie Chough- CS157B 304/10/23
What Is Data Mining?
Data Mining is the extraction of hidden predictive information from large databases.
Data Mining can predict future trends and behaviors allowing businesses to make proactive, knowledge-driven business decision.
Charlie Chough- CS157B 404/10/23
What Is Data Mining?
The Evolution of Data MiningEvolutionary Step Business Question Enabling Technologies Characteristics
Data Collection "What was my total revenue in the last five years?"
Computers, tapes, disks Retrospective, static data delivery
(1960s)
Data Access "What were unit sales in New England last March?"
Relational databases (RDBMS), Structured Query Language (SQL), ODBC
Retrospective, dynamic data delivery at record level
(1980s)
Data Warehousing & Decision Support
"What were unit sales in New England last March? Drill down to Boston."
On-line analytic processing (OLAP), multidimensional databases, data warehouses
Retrospective, dynamic data delivery at multiple levels
(1990s)
Data Mining "What’s likely to happen to Boston unit sales next month? Why?"
Advanced algorithms, multiprocessor computers, massive databases
Prospective, proactive information delivery
(Emerging Today)
Charlie Chough- CS157B 504/10/23
How Does Data Mining Work?
3 Phase Approach 1) Exploration 2) Model Building and Validation 3) Deployment
Charlie Chough- CS157B 604/10/23
How Does Data Mining Work?
Exploration Data Preparation
Cleaning Data Data Transformation Feature Selection Exploratory Data Analysis
Charlie Chough- CS157B 704/10/23
How Does Data Mining Work?
Model Building and Validation Techniques
Decision Trees Clustering Association Rules
Charlie Chough- CS157B 804/10/23
How Does Data Mining Work?
Model Building and Validation Decision Trees
Tree shaped structures that represent sets of decisions.
Charlie Chough- CS157B 904/10/23
How Does Data Mining Work?
Model Building and Validation Hierarchical Clustering
Clusters are discovered successively using previously established clusters.
Partitional Clustering All clusters are discovered at once.
Charlie Chough- CS157B 1004/10/23
How Does Data Mining Work?
Model Building and Validation Hierarchial Clustering
Agglomerative Clustering (up or down) All elements are treated as a cluster and are merged into
successively larger clusters. Divisive Clustering
Begins with the entire data set and breaks the data set into clusters.
Charlie Chough- CS157B 1104/10/23
How Does Data Mining Work?
Model Building and Validation Partitional Clustering
K-means clustering QT Clustering Fuzzy C-means Clustering
Charlie Chough- CS157B 1204/10/23
How Does Data Mining Work?
Model Building and Validation Association Rules
Association Rules describe a correlation of events. Support Confidence
Charlie Chough- CS157B 1304/10/23
How Does Data Mining Work?
Deployment Select the best model from the previous phase
and apply it to new data in order to generate predictions or estimates of the expected outcome.
Charlie Chough- CS157B 1404/10/23
Applications for Data Mining?
Retail Market Basket Analysis Business Intelligence Medicine Law Enforcement
Charlie Chough- CS157B 1504/10/23
Applications for Data Mining? Retail Market Basket Analysis
Online retailers that suggest other products based on what other customers have purchased
Merchandising based on what items customers purchase together
Milk and bread Diapers and Beer
Charlie Chough- CS157B 1604/10/23
Applications for Data Mining? Business Intelligence
Business Intelligence tools allow businesses to gather, store, access and analyze corporate data to aid in the decision-making process.
Customer Profiling Inventory and Distribution Analysis Market Research and Segmentation
Charlie Chough- CS157B 1704/10/23
Applications for Data Mining?
Medicine Data mining can be used to find combinations of
prescription drugs that can have harmful interaction or side effects.
Charlie Chough- CS157B 1804/10/23
Applications for Data Mining?
Law Enforcement Law enforcement agencies are using data
mining to help identify terrorists.
Charlie Chough- CS157B 1904/10/23
Issues Surrounding Data Mining
Privacy Concerns Data Dredging
Charlie Chough- CS157B 2004/10/23
Issues Surrounding Data Mining
Privacy Concerns Multi-state Anti-Terrorism Information Exchange
(MATRIX) Massive collection of non-publicly available, personal
data managed by a private Florida company.
Charlie Chough- CS157B 2104/10/23
Issues Surrounding Data Mining
Privacy Concerns Government agencies failed to properly
implement privacy rules for data mining. Lapses by the Dept. of Agriculture, FBI, IRS, Small
Business Administration and State Department increased the risk of data exposure.
Charlie Chough- CS157B 2204/10/23
Issues Surrounding Data Mining
Data Dredging The practice of imposing patterns on data where
none exist.
Charlie Chough- CS157B 2304/10/23
Conculsions
Data Mining is a powerful tool with real-world applications
But... Data Mining must be used carefully
Charlie Chough- CS157B 2404/10/23
References Silberschatz, Korth, Sudarshan. 2006. Database System
Concepts 5th Ed. New York, NY: McGraw Hill Wikipedia.com. 2006. (http://en.wikipedia.org/wiki/Data_mining) Thearling.com. 2006. (http://www.thearling.com) Small Business Computing.com. 2006. (
http://sbc.webopedia.com/TERM/B/Business_Intelligence.html)