data mining 1. 2 data mining extracting or “mining” knowledge from large amounts of data data...

30
DATA MINING 1

Upload: mervin-rogers

Post on 01-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

DATA MINING

1

Page 2: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

2

Data Mining Extracting or “mining” knowledge from large amounts of data

Data mining is the process of autonomously retrieving useful information or knowledge from large data stores or sets.

Data mining is a technique for searching large-scale

databases for patterns used mainly to find previously unknown correlations between variables.

Page 3: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

Data Mining Motivation

Changes in the Business EnvironmentCustomers becoming more demanding

Markets are saturated

Databases today are huge:More than 1,000,000 entities/records/rows

From 10 to 10,000 fields/attributes/variables

Gigabytes and terabytes

Databases a growing at an unprecedented rate Decisions must be made rapidly Decisions must be made with maximum knowledge

3

Page 4: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

4

Data Mining: Confluence of Multiple Disciplines

Data Mining

Database Technology

Statistics

MachineLearning

A.I.

AlgorithmOther

Disciplines

Visualization

Page 5: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

5

VISULIZATION

The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships.

Statistic

In data mining it is used for classifying and grouping things

Machine learning

the ability of a machine to improve its performance based on previous results.

Artificial Intelligence

the branch of computer science that deal with writing computer programs that can solve problems creatively

Algorithm

precise rule (or set of rules) specifying how to solve some problem

Page 6: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

Why Not Traditional Data Analysis

Tremendous amount of data

High complexity of data

6

Page 7: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

KDD is non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data

Data Mining is a step in KDD process consisting of particular data mining algorithms

7

Knowledge discovery in database

Page 8: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

8

Data Mining (cont.)

Data Mining is a step of Knowledge Discovery in Databases (KDD) Process Data Warehousing Data Selection Data Preprocessing Data Transformation Data Mining Interpretation/Evaluation

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Page 9: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

Steps of a KDD Process

Learning the application domain: relevant prior knowledge and goals of application

data selection Creating a target data set: Data cleaning and preprocessing: Data reduction and transformation:

Find useful features, dimensionality/variable reduction, invariant representation.

Choosing functions of data mining summarization, classification, regression, association,

clustering.

9

Page 10: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation

visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge

10

Page 11: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

DATA MININING EVALUTION

11

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 12: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

Data Mining: On What Kind of Data?

Relational databases Data warehouses Transactional databases Advanced DB and information repositories

Object-oriented and object-relational databases Spatial databases Text databases and multimedia databases WWW

12

Page 13: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

13

Data Mining Applications:

Banking: loan/credit card approvalpredict good customers based on old customers

Targeted marketing: identify likely responders to promotions

Fraud detection: telecommunications, financial transactionsfrom an online stream of event identify fraudulent events

Page 14: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

14

Data Mining Applications: Medicine: disease outcome, effectiveness of

treatments analyze patient disease history: find relationship

between diseases Molecular/Pharmaceutical: identify new drugs Scientific data analysis:

identify new galaxies by searching for sub clusters Web site/store design and promotion:

find affinity of visitor to pages and

Page 15: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

15

Financial Industry, Banks, Businesses, E-commerce

Stock and investment analysis Identify loyal customers vs. risky customer Predict customer spending Risk management Sales forecasting

Page 16: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

16

Data Mining in CRM:Customer Life Cycle Customer Life Cycle

The stages in the relationship between a customer and a business

Key stages in the customer lifecycle Prospects: people who are not yet customers but are

in the target market Responders: prospects who show an interest in a

product or service Active Customers: people who are currently using

the product or service Former Customers: may be “bad” customers who did

not pay their bills or who incurred high costs

Page 17: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

17

Data Mining in CRM

DM helps to Determine the behavior surrounding a particular

lifecycle event Find other people in similar life stages and

determine which customers are following similar behavior patterns

Page 18: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

18

Data Mining in CRM (cont.)

Data Warehouse Data Mining

Campaign Management

Customer Profile

Customer Life Cycle Info.

Page 19: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

19

Data Mining Techniques

Data Mining Techniques

Descriptive Predictive

Clustering

Association

Classification

Regression

Sequential Analysis

Decision Tree

Rule Induction

Neural Networks

Nearest Neighbor Classification

Page 20: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

Predictive DM

20

Predictive data mining, which produces the model of the system described by the given data set

build models in order to estimate unknown values of interest.

Examples:

Given a customer’s characteristics a model predicts how much the customer will spend on the next catalog order.

Page 21: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

21

Descriptive DM

Descriptive data mining, which produces new, nontrivial information based on the available data set.

Descriptive DM is used to learn about and understand the data. Example:

Identify and describe groups of customers with common buying behavior

Page 22: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

22

Classification Classification is the process of sub-dividing a data set with regard to a number of specific outcomes.

Example

Given old data about customers and payments, predict new applicant’s loan eligibility.

Page 23: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

23

Decision Trees

hair eyes class

brown blue A

brown brown B

red blue A

dark blue B

dark blue B

brown blue A

dark brown B

brown brown B

Tree where internal nodes are simple decision rules on one or more attributes and leaf nodes are predicted class labels

Page 24: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

24

Decision Trees:Learned Predictive Rules

hair

eyesB

B

A

A

darkred

brown

blue brown

Page 25: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

25

Rule induction

In rule induction action are given and we have to discover the rule.

The extraction of useful if-then rules from data based on statistical significance.

Rule induction is an area of machine learning in which formal rules are extracted from a set of observations.

Examples

Do not give the discount on 2 items that are frequently brought. use the discount on 1 to pull the others.

Send camcorder offer to VCR purchasers 2-3 months after VCR purchase.

Page 26: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

26

NEUTAL NETWORK

Set of nodes connected by directed weighted edges

Useful for learning complex data like handwriting, speech and image recognition.

Neural networks have broad applicability to real world business problems and have already been successfully applied in many industries. Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:

Page 27: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

27

Page 28: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

NEAREST NEIGHBOUR MEHTOD

The nearest neighbor algorithm in pattern recognition is a method for classifying phenomena based upon observable features.

Define proximity between instances, find neighbors of new instance and assign majority class.

The nearest neighbor algorithm is a heuristic algorithm that is not guaranteed to produce a correct result in most cases.

Page 29: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

29

Clustering The art of finding groups in data.

Objective: gather items from a database into sets according to (unknown) common characteristics.

Group existing customers based on time series of payment history such that similar customers in same cluster.

Key requirement: Need a good measure of similarity between instances.

Page 30: DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful

Major issues in data mining

30

Mining different kinds of knowledge in databases.

Expression and visualization of data mining results.

Handling noise and incomplete data.

Pattern evaluation: the interestingness problem.

Efficiency and scalability of data mining algorithms.

Handling relational and complex types of data.