1 an excel-based data mining tool chapter 4. 2 4.1 the idata analyzer

Post on 22-Dec-2015

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

An Excel-based Data Mining Tool

Chapter 4

2

4.1 The iData Analyzer

3

Data

PreProcessor

Interface

HeuristicAgent

NeuralNetworks

LargeDataset

ESX

MiningTechnique

GenerateRules

RulesRuleMaker

ReportGenerator

ExcelSheets

Explaination

Yes

No

No

Yes

Yes

No

4

5

4.2 ESX: A Multipurpose Tool for Data Mining

6

ESX

• Supports supervised learning and unsupervised clustering

• Does not make statistical assumptions

• Deal with missing attribute values

• Applied to categorical and numerical data

• Point out inconsistencies and unusual values

7

• For supervised classification, ESX can determine those instances and attributes best able to classify new instances

• For unsupervised clustering, ESX incorporates a globally optimizing evaluation function that encourages a best instance clustering

8

Root

CnC1 C2

I11 I1jI12

Root Level

Instance Level

Concept Level

. . .

. . .

I21 I2kI22

. . . In1 InlIn2

. . .

9

4.3 iDAV Format for Data Mining

10

Table 4.1 • Credit Card Promotion Database: iDAV Format

Income Magazine Watch Life Insurance Credit CardRange Promotion Promotion Promotion Insurance Sex Age

C C C C C C RI I I I I I I

40–50K Yes No No No Male 4530–40K Yes Yes Yes No Female 4040–50K No No No No Male 4230–40K Yes Yes Yes Yes Male 4350–60K Yes No Yes No Female 3820–30K No No No No Female 5530–40K Yes No Yes Yes Male 3520–30K No Yes No No Male 2730–40K Yes No No No Male 4330–40K Yes Yes Yes No Female 4140–50K No Yes Yes No Female 4320–30K No Yes Yes No Male 2950–60K Yes Yes Yes No Female 3940–50K No Yes No No Male 5520–30K No No Yes Yes Female 19

11

Table 4.2 • Values for Attribute Usage

Character Usage

I The attribute is used as an input attribute.

U The attribute is not used. D The attribute is not used for classification or clustering, but

attribute value summary information is displayed in all output reports.

O The attribute is used as an output attribute. For supervised learning with ESX, exactly one categorical attribute is selected as the output attribute.

12

4.4 A Five-step Approach for Unsupervised Clustering

Step 1: Enter the Data to be Mined

Step 2: Perform a Data Mining Session

Step 3: Read and Interpret Summary Results

Step 4: Read and Interpret Individual Class Results

Step 5: Visualize Individual Class Rules

13

Step 1: Enter The Data To Be Mined

14

15

Step 2: Perform A Data Mining Session

16

17

18

Step 3: Read and Interpret Summary Results

• Class Resemblance Scores• Domain Resemblance Score

–Attributes, instances, no model• Domain Predictability

19

20

21

Step 4: Read and Interpret Individual Class Results

• Class Predictability is a within-class measure.

• Class Predictiveness is a between- class measure.

22

23

24

Step 5: Visualize Individual Class Rules

25

26

4.5 A Six-Step Approach for Supervised Learning

Step 1: Choose an Output Attribute

Step 2: Perform the Mining Session

Step 3: Read and Interpret Summary Results

Step 4: Read and Interpret Test Set Results

Step 5: Read and Interpret Class Results

Step 6: Visualize and Interpret Class Rules

27

Read and Interpret Test Set Results

28

4.6 Techniques for Generating Rules

• 1. Choose an attribute

• 2. use the attribute to subdivide instances into classes

• 3. – if the instances in the subclass satisfy a

predefined criteria, generate a defining rule– If not, repeat 1

29

4.6 Techniques for Generating Rules

1. Define the scope of the rules.

2. Choose the instances.

3. Set the minimum rule correctness.

4. Define the minimum rule coverage.

5. Choose an attribute significance value.

30

31

4.7 Instance Typicality

32

Typicality Scores

• Identify prototypical and outlier instances.

• Select a best set of training instances.

• Used to compute individual instance classification confidence scores.

33

34

4.8 Special Considerations and Features

• Avoid Mining Delays

• The Quick Mine Feature

• Erroneous and Missing Data

top related