data mining presentation
TRANSCRIPT
Data Mining
Ahmet Fahri Kılıçb101316014
Concepts Evolution Step Why data mining? What is data mining? Data Mining Process A Simple Data Mining Example Conclusions
Evolution Step 1960s Data Collection
What was my total revenue in the last five years? 1980s Data Access
What were unit sales in New England last March? 1990s Data Warehousing
What were unit sales in New England last march? Drill down to Boston
Today Data Mining What’s likely to happen to Boston unit sale next month?
Why?
Why Data Mining? Evolution of database technology:
To collect a large amount of data primitive file processing
To store and query data efficiently DBMS
New challenges: huge amount of data, how to analyze and understand? Data mining
Why data mining?
Large number of data (attributes and/or instances)
Need to take decision from them When size of data increases, information can
be gain easily from them are decreases Not to automate decision but the process itself
must be automatic or semi-automatic
What Is Data Mining? Data Mining An information activity that extracts
Facts Useful information Patterns
from data in large databases.
Process of discovering “knowledge/pattern” in data
What is Data Mining?
“Data Mining can be described as the non-trivial process of extracting previously unknown, interesting, potentially useful and ultimately understandable knowledge from huge datasets.”
-- Usama Fayyad
Knowledge Discovery Knowledge Discovery in Databases
(KDD) “The non-trivial process of identifying valid,
novel, potentially useful, and ultimately understandable patterns in data.”
KDD Process (iterative and interactive) Identifying the problem Preparing the data Building the model (data mining) Using and monitoring the model
The KDD Process
Data
Target data
Preprocesseddata
Transformeddata
Patterns
Knowledge
Selection Preprocessing
Transformation
Data mining
Interpretation/evaluation
Data Mining Process Preprocessing
Data Cleaning handles noisy, errors, missing,
irrelevant data Data Integration
multiple, heterogeneous data integrated into one
Data Mining Process 1. Data Selection
Identifying the data to be mined Choosing proper input attributes
and output information
Data Mining Process 2. Data Transformation
Form appropriate format Organizing and converting data in
desired ways Reducing the dimensionality of the
data Normalizing data
Data Mining Process 3. Data Mining
Implementing techniques to extract patterns of interest
Data Mining Algorithms: Decision Trees Neural Networks Rule Induction Nearest Neighbor Genetic Algorithms
Data Mining Process 4. Pattern evaluation
Where the mined data are being tested and assessed for understanding the synthesized knowledge and its range of validity
Data Mining Process 5. Knowledge presentation
Presenting the results to the decision-maker
Imagine a company that solicits business primarily by mailing advertisements:
• This company has compiled a store of data containing information about the customers receiving these ads, and the response history.
• This database could then be mined to discover trends, patterns, or systematic relationships that reveal some identifying characteristics of customers who responded favorably.
• With this knowledge, future advertisement mailings could be directed only to new customers with these characteristics.
A Simple Data Mining Example
A Simple Data Mining ExamplecustomerID gender birthdate city state response045 F 01/08/69 Benicia CA Y
678 M 07/13/65 Dallas TX N
256 F 10/21/72 Boston MA Y
customerID gender birthdate city state response024 M 10/23/48 Bethesda MD
098 M 4/21/62 Lincoln NE
781 F 12/21/76 Tucson AZ
data mining identifying characteristics of responsive customers
Data Mining Process Preprocessing
Data Cleaning handles noisy, errors, missing,
irrelevant data Data Integration
multiple, heterogeneous data integrated into one
Data Mining Process 1. Data Selection
Identifying the data to be mined Choosing proper input attributes
and output information
Data Mining Process 2. Data Transformation
Form appropriate format Organizing and converting data in
desired ways Reducing the dimensionality of the
data Normalizing data
Data Mining Process 3. Data Mining
Implementing techniques to extract patterns of interest
Data Mining Algorithms: Decision Trees Neural Networks Rule Induction Nearest Neighbor Genetic Algorithms
Data Mining Process 4. Pattern evaluation
Where the mined data are being tested and assessed for understanding the synthesized knowledge and its range of validity
Data Mining Process 5. Knowledge presentation
Presenting the results to the decision-maker
Conclusions In real word, we need information to
make decision. We can gain information from data in
databases by data mining process. Data mining is also useful in future
forecasting dependent on data in databases.
The key to successful data mining is having good quality data.
Questions?