part ii tools for knowledge discovery. knowledge discovery in databases chapter 5

27
Part II Tools for Knowledge Discovery

Post on 20-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Part II

Tools for

Knowledge Discovery

Page 2: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Knowledge Discovery in Databases

Chapter 5

Page 3: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

5.1 A KDD Process Model

Page 4: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Figure 5.1 A seven-step KDD process model

Step 3: Data Preprocessing

CleansedData

Step 2: Create Target Data

DataWarehouse

TargetData

Step 1: Goal Identification

DefinedGoals

Step 4: Data Transformation

TransformedData

Step 7: Taking Action

Step 6: Interpretation & EvaluationStep 5: Data Mining

DataModel

Transactional

Database

FlatFile

Page 5: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Figure 5.2 Applyiing the scientific method to data mining

The Scientific Method

Define the Problem

A KDD Process Model

Take Action

Interpretation / Evaluation

Create Target DataData PreprocessingData TransformationData Mining

Identify the Goal

Verifiy Conclusions

Draw Conclusions

Perform an Experiment

Formulate a Hypothesis

{

Page 6: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Step 1: Goal Identification

• Define the Problem.

• Choose a Data Mining Tool.

• Estimate Project Cost.

• Estimate Project Completion Time.

• Address Legal Issues.

• Develop a Maintenance Plan.

Page 7: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Step 2: Creating a Target Dataset

Page 8: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Figure 5.3 The Acme credit card database

Page 9: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Step 3: Data Preprocessing

• Noisy Data

• Missing Data

Page 10: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Noisy Data

• Locate Duplicate Records.

• Locate Incorrect Attribute Values.

• Smooth Data.

Page 11: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Preprocessing Missing Data

• Discard Records With Missing Values.

• Replace Missing Real-valued Items With the Class Mean.

• Replace Missing Values With Values Found Within Highly Similar Instances.

Page 12: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Processing Missing Data While Learning

• Ignore Missing Values.

• Treat Missing Values As Equal Compares.

• Treat Missing values As Unequal Compares.

Page 13: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Step 4: Data Transformation

• Data Normalization

• Data Type Conversion

• Attribute and Instance Selection

Page 14: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Data Normalization

• Decimal Scaling

• Min-Max Normalization

• Normalization using Z-scores

• Logarithmic Normalization

Page 15: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Attribute and Instance Selection

• Eliminating Attributes

• Creating Attributes

• Instance Selection

Page 16: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Table 5.1 • An Initial Population for Genetic Attribute Selection

Population Income Magazine Watch Credit CardElement Range Promotion Promotion Insurance Sex Age

1 1 0 0 1 1 12 0 0 0 1 0 13 0 0 0 0 1 1

Page 17: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Step 5: Data Mining

1. Choose training and test data.

2. Designate a set of input attributes.

3. If learning is supervised, choose one or more output attributes.

4. Select learning parameter values.

5. Invoke the data mining tool.

Page 18: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Step 6: Interpretation and Evaluation

• Statistical analysis.

• Heuristic analysis.

• Experimental analysis.

• Human analysis.

Page 19: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Step 7: Taking Action

• Create a report.

• Relocate retail items.

• Mail promotional information.

• Detect fraud.

• Fund new research.

Page 20: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

5.9 The Crisp-DM Process Model

1. Business understanding

2. Data understanding

3. Data preparation

4. Modeling

5. Evaluation

6. Deployment

Page 21: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

5.10 Experimenting with ESX

Page 22: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

A Four-Step Model for Knowledge Discovery

1. Identify the goal.

2. Prepare the data.

3. Apply data mining.

4. Interpret and evaluate the results.

Page 23: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Experiment 1: Attribute Evaluation

*Applying the Four-Step Process Model to the Credit Screening

Dataset*

Page 24: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Table 5.2 • A Confusion Matrix for Credit Card Screening

Computed ComputedAccept Reject

Accept 115 38Reject 35 152

Page 25: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Table 5.3 • Test Set Results for a Most Typical Training Model

Computed ComputedAccept Reject

Accept 98 55Reject 25 162

Page 26: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Experiment 2: Parameter Evaluation

*Applying the Four-Step Process Model to the Satellite Image

Dataset*

Page 27: Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5

Figure 5.4 Satellite image data