an excel-based data mining tool
DESCRIPTION
An Excel-based Data Mining Tool. Chapter 4. 4.1 The iData Analyzer. 4.2 ESX: A Multipurpose Tool for Data Mining. 4.3 iDAV Format for Data Mining. 4.4 A Five-step Approach for Unsupervised Clustering. Step 1: Enter the Data to be Mined Step 2: Perform a Data Mining Session - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/1.jpg)
An Excel-based Data Mining Tool
Chapter 4
![Page 2: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/2.jpg)
4.1 The iData Analyzer
![Page 3: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/3.jpg)
Figure 4.1 The iDA system architecture
Data
PreProcessor
Interface
HeuristicAgent
NeuralNetworks
LargeDataset
ESX
MiningTechnique
GenerateRules
RulesRuleMaker
ReportGenerator
ExcelSheets
Explaination
Yes
No
No
Yes
Yes
No
![Page 4: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/4.jpg)
Figure 4.2 A successful installation
![Page 5: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/5.jpg)
4.2 ESX: A Multipurpose Tool for Data Mining
![Page 6: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/6.jpg)
Figure 4.3 An ESX concept hierarchy
Root
CnC1 C2
I11 I1jI12
Root Level
Instance Level
Concept Level
. . .
. . .
I21 I2kI22
. . . In1 InlIn2
. . .
![Page 7: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/7.jpg)
4.3 iDAV Format for Data Mining
![Page 8: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/8.jpg)
Table 4.1 • Credit Card Promotion Database: iDAV Format
Income Magazine Watch Life Insurance Credit CardRange Promotion Promotion Promotion Insurance Sex Age
C C C C C C RI I I I I I I
40–50K Yes No No No Male 4530–40K Yes Yes Yes No Female 4040–50K No No No No Male 4230–40K Yes Yes Yes Yes Male 4350–60K Yes No Yes No Female 3820–30K No No No No Female 5530–40K Yes No Yes Yes Male 3520–30K No Yes No No Male 2730–40K Yes No No No Male 4330–40K Yes Yes Yes No Female 4140–50K No Yes Yes No Female 4320–30K No Yes Yes No Male 2950–60K Yes Yes Yes No Female 3940–50K No Yes No No Male 5520–30K No No Yes Yes Female 19
![Page 9: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/9.jpg)
Table 4.2 • Values for Attribute Usage
Character Usage
I The attribute is used as an input attribute.
U The attribute is not used. D The attribute is not used for classification or clustering, but
attribute value summary information is displayed in all output reports.
O The attribute is used as an output attribute. For supervised learning with ESX, exactly one categorical attribute is selected as the output attribute.
![Page 10: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/10.jpg)
4.4 A Five-step Approach for Unsupervised Clustering
Step 1: Enter the Data to be Mined
Step 2: Perform a Data Mining Session
Step 3: Read and Interpret Summary Results
Step 4: Read and Interpret Individual Class Results
Step 5: Visualize Individual Class Rules
![Page 11: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/11.jpg)
Step 1: Enter The Data To Be Mined
![Page 12: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/12.jpg)
Figure 4.4 The Credit Card Promotion Database
![Page 13: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/13.jpg)
Step 2: Perform A Data Mining Session
![Page 14: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/14.jpg)
Figure 4.5 Unsupervised settings for ESX
![Page 15: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/15.jpg)
Figure 4.6 RuleMaker options
![Page 16: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/16.jpg)
Step 3: Read and Interpret Summary Results
• Class Resemblance Scores
• Domain Resemblance Score
• Domain Predictability
![Page 17: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/17.jpg)
Summary Results
• Class Resemblance Score offers a first indication about how well the instances within each class (cluster) fit together.
• Domain Resemblance Score represents the overall similarity of all instances within the data set.
• It is highly desirable that class resemblance scores are higher that the domain resemblance score
![Page 18: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/18.jpg)
Summary Results
• Given categorical attribute A with values v1, v2, v3, …, vi,… vn, the Domain Predictability of vi tells us the domain instances showing vi as a value for A.
• A predictability score near 100% for a domain-level categorical attribute value indicates that the attribute is not likely to be useful for supervised learning or unsupervised clustering
![Page 19: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/19.jpg)
Summary Results• Given categorical attribute A with values v1, v2, v3, …, vi,… vn, the Class C Predictability score for vi tells us the percent of instances within class C shoving vi as a value for A.
• Given class C and categorical attribute A with values v1, v2, v3, …, vi,… vn, an Attribute-Value Predictiveness score for vi is defined as the probability an instance resides in C given the instance has value vi for A.
![Page 20: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/20.jpg)
Domain Statistics for Numerical Attributes
• Attribute Significance Value measures the predictive value of each numerical attribute.
• To calculate the Attribute Significance Value for a numeric attribute, it is necessary to: a) subtract the smallest class mean from the largest mean value; b) divide this result by the domain standard deviation
![Page 21: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/21.jpg)
Figure 4.8 Summery statistics for the Acme credit card promotion database
![Page 22: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/22.jpg)
Figure 4.9 Statistics for numerical attributes and common categorical
attribute values
![Page 23: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/23.jpg)
Step 4: Read and Interpret Individual Class Results
• Class Predictability is a within-class measure.
• Class Predictiveness is a between-class measure.
![Page 24: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/24.jpg)
Necessary and Sufficient Attribute Values
• If an attribute value has a predictability and predictiveness score of 1.0, the attribute value is said to be necessary and sufficient for membership in class C. That is, all instances within class C have the specified value for the attribute and all instances with this value for the attribute reside in class C.
![Page 25: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/25.jpg)
Sufficient Attribute Values
• If an attribute value has a predictiveness score of 1.0 and a predictability score less than 1.0, the attribute value is said to be sufficient but not necessary for membership in class C. That is, all instances with the value for the attribute reside in C, but there are other instances in C that have a different value for this attribute.
![Page 26: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/26.jpg)
Necessary Attribute Values
• If an attribute value has a predictability score of 1.0 and a predictiveness score less than 1.0, the attribute value is said to be necessary but not sufficient for membership in class C. That is, all instances in C have the same value for the attribute, but there are other instances outside C, have the same value for this attribute.
![Page 27: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/27.jpg)
Necessary and Sufficient Attribute Values in iDA
• The attribute values with predictiveness scores greater than or equal to 0.8 are considered as highly sufficient.
• The attribute values with predictability scores greater than or equal to 0.8 are considered as necessary.
![Page 28: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/28.jpg)
Figure 4.10 Class 3 summary results
![Page 29: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/29.jpg)
Figure 4.11 Necessary and sufficient attribute values for Class 3
![Page 30: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/30.jpg)
Step 5: Visualize Individual Class Rules
![Page 31: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/31.jpg)
Figure 4.7 Rules for the credit card promotion database
![Page 32: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/32.jpg)
Rule Interpretation in iDA
• Each rule simply declares the precondition(s) necessary for an instance to be covered by the rule:
• if [(condition & condition &…& condition)=true] then an instance resides in a certain class.
![Page 33: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/33.jpg)
Rule Interpretation in iDA
• Rule accuracy tells us the rule is accurate in …% of all cases where it applies.
• Rule coverage shows that the rule applies that the rule applies to …% of class instances
![Page 34: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/34.jpg)
4.5 A Six-Step Approach for Supervised Learning
Step 1: Choose an Output Attribute
Step 2: Perform the Mining Session
Step 3: Read and Interpret Summary Results
Step 4: Read and Interpret Test Set Results
Step 5: Read and Interpret Class Results
Step 6: Visualize and Interpret Class Rules
![Page 35: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/35.jpg)
Figure 4.12 Test set instance classification
Read and Interpret Test Set Results
![Page 36: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/36.jpg)
4.6 Techniques for Generating Rules
1. Define the scope of the rules.
2. Choose the instances.
3. Set the minimum rule correctness.
4. Define the minimum rule coverage.
5. Choose an attribute significance value.
![Page 37: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/37.jpg)
4.7 Instance Typicality
![Page 38: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/38.jpg)
Typicality Scores
• Identify prototypical and outlier instances.
• Select a best set of training instances.
• Used to compute individual instance classification confidence scores.
![Page 39: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/39.jpg)
Figure 4.13 Instance typicality
![Page 40: An Excel-based Data Mining Tool](https://reader036.vdocument.in/reader036/viewer/2022062519/56814d0d550346895dba4acf/html5/thumbnails/40.jpg)
4.8 Special Considerations and Features
• Avoid Mining Delays
• The Quick Mine Feature
• Erroneous and Missing Data