dr. chen, data mining a/w & dr. chen, data mining chapter 2 data mining: a closer look jason c....

41
A/W & Dr. Chen, Data Mining Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA 99223 [email protected]

Upload: scott-carson

Post on 02-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Chapter 2Data Mining: A Closer Look

Jason C. H. Chen, Ph.D.Professor of MIS

School of Business AdministrationGonzaga UniversitySpokane, WA 99223

[email protected]

Page 2: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

2.1 Data Mining Strategies

Page 3: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

3A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Figure 2.1 - A hierarchy of data mining strategies

Data Mining Strategies

Unsupervised Clustering

Supervised Learning

Market Basket Analysis

Classification EstimationPrediction

Categorical/discrete(current behavior)

NumericFuture outcome

(categorical/numeric)

No output attributes

Page 4: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Classification

• Learning is supervised.

• The dependent variable is categorical.

• Well-defined classes.

• Current rather than future behavior.

Estimation• Learning is supervised.

• The dependent variable is numeric.

• Well-defined classes.

• Current rather than future behavior.

Prediction

• The emphasis is on predicting future rather than current outcomes.

• The output attribute may be categorical or numeric.

Page 5: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

5A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

The Cardiology Patient Dataset

Page 6: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

6A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Attribute Mixed Numeric

Name Values Values Comments

Age Numeric Numeric Age in years

Sex Male, Female 1, 0 Patient gender

Chest Pain Type Angina, Abnormal Angina, 1–4 NoTang =Nonanginal

NoTang, Asymptomatic pain

Blood Pressure Numeric Numeric Resting blood pressure

upon hospital admission

Cholesterol Numeric Numeric Serum cholesterol

Fasting Blood True, False 1, 0 Is fasting blood sugar less

Sugar < 120 than 120?

Resting ECG Normal, Abnormal, Hyp 0, 1, 2 Hyp = Left ventricular

hypertrophy

Maximum Heart Numeric Numeric Maximum heart rate

Rate achieved

Induced Angina? True, False 1, 0 Does the patient experience angina

as a result of exercise?

Old Peak Numeric Numeric ST depression induced by exercise

relative to rest

Slope Up, f lat, dow n 1–3 Slope of the peak exercise ST

segment

Number Colored 0, 1, 2, 3 0, 1, 2, 3 Number of major vessels

Vessels colored by f luorosopy

Thal Normal f ix, rev 3, 6, 7 Normal, f ixed defect,

reversible defect

Concept Class Healthy, Sick 1, 0 Angiographic disease status

Table 2.1 • Cardiology Patient Data

Page 7: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

7A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Attribute Most Typical Least Typical Most Typical Least Typical

Name Healthy Class Healthy Class Sick Class Sick Class

Age 52 63 60 62

Sex Male Male Male Female

Chest Pain Type NoTang Angina Asymptomatic Asymptomatic

Blood Pressure 138 145 125 160

Cholesterol 223 233 258 164

Fasting Blood Sugar < 120 FALSE TRUE FALSE FALSE

Resting ECG Normal Hyp Hyp Hyp

Maximum Heart Rate 169 150 141 145

Induced Angina? FALSE FALSE TRUE FALSE

Old Peak 0 2.3 2.8 6.2

Slope Up Down Flat Down

Number of Colored Vessels 0 0 1 3

Thal Normal Fix Rev Rev

Table 2.2 • Most and Least Typical Instances from the Cardiology Domain

Page 8: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

A Healthy Class Rule for the Cardiology Patient Dataset

IF 169 <= Maximum Heart Rate <=202

THEN Concept Class = Healthy

Rule accuracy: 85.07%

Rule coverage: 34.55%

A Sick Class Rule for the Cardiology Patient Dataset

IF Thal = Rev & Chest Pain Type = Asymptomatic

THEN Concept Class = Sick

Rule accuracy: 91.14%

Rule coverage: 52.17%

Page 9: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Unsupervised Clustering

• Determine if concepts can be found in the data.• Evaluate the likely performance of a supervised model.• Determine a best set of input attributes for supervised learning.• Detect Outliers.

Page 10: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

10A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Market Basket Analysis

• Find interesting relationships among retail products.

• Uses association rule algorithms.

• The results of a market basket analysis help retailers– Design promotions,

– Arrange shelf or catalog items, and

– Develop cross-marketing strategies

Page 11: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

11A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

2.2 Supervised Data Mining Techniques

Page 12: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

12A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

The Credit Card Promotion Database

Page 13: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

13A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Income Magazine Watch Life Insurance Credit CardRange ($) Promotion Promotion Promotion Insurance Sex Age

40–50K Yes No No No Male 45

30–40K Yes Yes Yes No Female 40

40–50K No No No No Male 42

30–40K Yes Yes Yes Yes Male 43

50–60K Yes No Yes No Female 38

20–30K No No No No Female 55

30–40K Yes No Yes Yes Male 35

20–30K No Yes No No Male 27

30–40K Yes No No No Male 43

30–40K Yes Yes Yes No Female 41

40–50K No Yes Yes No Female 43

20–30K No Yes Yes No Male 29

50–60K Yes Yes Yes No Female 39

40–50K No Yes No No Male 55

20–30K No No Yes Yes Female 19

Table 2.3 • The Credit Card Promotion Database

Page 14: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

14A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Income RangeMagazine PromoWatch PromoLife Ins PromoCredit Card Ins.Sex AgeC C C C C C RI I I I I I I40-50,000 Yes No No No Male 4530-40,000 Yes Yes Yes No Female 4040-50,000 No No No No Male 4230-40,000 Yes Yes Yes Yes Male 4350-60,000 Yes No Yes No Female 3820-30,000 No No No No Female 5530-40,000 Yes No Yes Yes Male 3520-30,000 No Yes No No Male 2730-40,000 Yes No No No Male 4330-40,000 Yes Yes Yes No Female 4140-50,000 No Yes Yes No Female 4320-30,000 No Yes Yes No Male 2950-60,000 Yes Yes Yes No Female 3940-50,000 No Yes No No Male 5520-30,000 No No Yes Yes Female 19

Data file: CreditCardPromotion.xls

Page 15: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

A Hypothesis for the Credit Card Promotion Database

A combination of one or more of the dataset attributes differentiate Acme Credit Card Company card holders who have taken advantage of the life insurance promotion and those card holders who have chosen not to participate in the promotional offer.

Page 16: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

A Production Rule for theCredit Card Promotion Database

IF Sex = Female & 19 <=Age <= 43

THEN Life Insurance Promotion = Yes

Rule Accuracy: 100.00%

Rule Coverage: 66.67%

Question: Can we assume that two-thirds of all females in the specified age range will take advantage of the

promotion?

• Rule accuracy is a between-class measure.• Rule coverage is a within-class measure.

Page 17: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

17A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Neural Networks

• A neural network is a set of interconnected nodes designed to imitate the functioning of the human brain.

• Two phases of operations– Learning phase at the input layer until it reaches a

predetermined minimum error rate– Fixing weights and recompute output values for new

instances• A major shortcoming of the neural network approach

is a lack of explanation about what has been learned.

Page 18: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

18A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

InputLayer

OutputLayer

HiddenLayer

Page 19: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

19A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Table 2.3 • The Credit Card Promotion Database

Income Magazine Watch Life Insurance Credit Card

Range ($) Promotion Promotion Promotion Insurance Sex Age

40–50K Yes No No No Male 45

30–40K Yes Yes Yes No Female 40

(Note that blue: input attributes, red: output attributes)Therefore, there four input nodes and one output node and chose five hidden-layer nodes.

Page 20: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

20A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Table 2.4 - Neural Network Training: Actual and Computed Output

Instance Number Life Insurance Promotion Computed Output

1 0 0.024

2 1 0.998

3 0 0.023

4 1 0.986

5 1 0.999

6 0 0.05

7 1 0.999

8 0 0.262

9 0 0.06

10 1 0.997

11 1 0.999

12 1 0.776

13 1 0.999

14 0 0.023

15 1 0.999

Page 21: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

21A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Table 2.4 - Neural Network Training: Actual and Computed Output

Instance Number Life Insurance Promotion Computed Output

1 0 0.024

2 1 0.998

3 0 0.023

4 1 0.986

5 1 0.999

6 0 0.05

7 1 0.999

8 0 0.262

9 0 0.06

10 1 0.997

11 1 0.999

12 1 0.776

13 1 0.999

14 0 0.023

15 1 0.999

Page 22: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

22A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Statistical Regression

• Statistical regression is a supervised learning technique that generalizes a set of numeric data by creating a mathematical equation relating one or more input attributes to a single numeric output attributes.

• Linear regression model is characterized by an output attribute whose value is determined by a linear sum of weighted input attribute values.

Page 23: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Statistical RegressionLife insurance promotion =

0.5909 (credit card insurance) -

0.5455 (sex) + 0.7727

Example, a female who does not have credit card insurance is a likely candidate for the life insurance promotion.

Life insurance promotion = 0.5909 (0) –

0.5455 (0) + 0.7727 = 0.7727

Because the value 0.7727 is close to 1.0, we conclude that the individual is likely to take advantage of the promotional offer.

Page 24: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

24A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

2.3 Association Rules

• Association rule mining techniques are used to discover interesting associations between attributes contained in a database.

• Unlike traditional association rules, association rules can have one or several output attributes.

• An output attribute for one rule can be an input attribute for another rule.

• A popular technique for ‘market basket’ analysis.• Problem: we may have several rules with little value.

Page 25: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

An Association Rule for the Credit Card Promotion

DatabaseIF Sex = Female & Age = over40 &

Credit Card Insurance = No

THEN Life Insurance Promotion = Yes

Page 26: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

26A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

2.4 Clustering Techniques

• By applying unsupervised clustering to the instances of the Acme Credit Card Company database, we will find a subset of input attributes that differentiate card holders who have taken advantage of the life insurance promotion from those cardholders who have not accepted the promotion offer.

Page 27: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

27A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

# Instances: 5Sex: Male => 3

Female => 2Age: 37.0Credit Card Insurance: Yes => 1

No => 4Life Insurance Promotion: Yes => 2

No => 3

Cluster 1

Cluster 2

Cluster 3

# Instances: 3Sex: Male => 3

Female => 0Age: 43.3Credit Card Insurance: Yes => 0

No => 3Life Insurance Promotion: Yes => 0

No => 3

# Instances: 7Sex: Male => 2

Female => 5Age: 39.9Credit Card Insurance: Yes => 2

No => 5Life Insurance Promotion: Yes => 7

No => 0

IF Sex=Female & 43>=Age>=35 & Credit Card Insurance=NOTHEN Class = 3 Rule Accuracy: 100% Rule Coverage: 66.67%

Page 28: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

28A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

End here for now!

Page 29: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

29A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

2.5 Evaluating Performance

• Performance evaluation is probably the most critical of all the steps in the data mining process. Three general questions:– 1. Will the benefits received from a data mining

project more than offset the cost of the data mining process?

– 2. How do we interpret the results of a data mining session?

– 3. Can we use the results of a data mining process with confidence?

Page 30: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

30A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Evaluating Supervised Learner Models

• Supervised learner models are designed to classify, estimate, and/or predict future outcome.

• Applications on classification correctness:– Develop a model to accept to reject credit card

applications– Develop a model to accept or reject home mortgage

applicants– Develop a model to decide whether or not to drill for

oil

Page 31: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Confusion Matrix• A matrix used to summarize the results of a supervised classification. • Entries along the main diagonal are correct classifications.• Entries other than those on the main diagonal are classification errors.

Classification correctness is best calculated by presenting previously unseen data in the form of a test to the model being evaluated.A confusion matrix is of little use for evaluating supervised learner models offering numeric output.

Page 32: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

32A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Table 2.5 • A Three-Class Confusion Matrix

Computed Decision C1 C2 C3C1 C11 C12 C13C2 C21 C22 C23C3 C31 C32 C33

Rule 2: for C2, C21,C22,C23 are all actually members of C2; but C21 and C23 are incorrectly classified as members of another class.Rule 3: for C2, C12 and C32 are instances are incorrectly classified as members of class C2.

Rule 1: Values along the main diagonal represent correct classificatione.g., C11 represents the total number of class C1 instance correctly classified by the model

Computation questions: #1

Page 33: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

33A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Computed Computed Accept Reject

Accept True False Accept Reject Reject False True Accept Reject

Table 2.6 • A Simple Confusion Matrix

Two-Class Error Analysis

The more the better

The less the better

Page 34: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

34A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Model Computed Computed Model Computed Computed A Accept Reject B Accept Reject

Accept 600 25 Accept 600 75 Reject 75 300 Reject 25 300

Table 2.7 • Two Confusion Matrices Each Showing a 10% Error Rate

Which model is better?

Page 35: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Evaluating Numeric

• Mean absolute error

• Mean squared error

• Root mean squared error

Page 36: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

36A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Comparing Models by Measuring Lift

• Marketing applications that focus on response rates from mass mailings are less concerned with test set classification error and more interested in building models able to extract bias samples from large populations.

• Supervised learner models designed for extracting bias samples from a general population are often evaluated by a measure that comes directly from marketing known as lift.

Page 37: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Computing Lift

)|(

)|(

PopulationCP

SampleCPLift

i

i

Ci is the class of al zero-balance customers who, given the

opportunity, will take advantage of the promotional offer.

Page 38: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

38A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

0

200

400

600

800

1000

1200

0 10 20 30 40 50 60 70 80 90 100

NumberResponding

% Sampled

Figure 2.4 – A lift chart (Targeted vs. mass mailing)

Page 39: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

39A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

No Computed Computed Ideal Computed Computed Model Accept Reject Model Accept Reject

Accept 1,000 0 Accept 1,000 0 Reject 99,000 0 Reject 0 99,000

Table 2.8 • Two Confusion Matrices: No Model and an Ideal Model

Model Computed Computed Model Computed Computed X Accept Reject Y Accept Reject

Accept 540 460 Accept 450 550 Reject 23,460 75,540 Reject 19,550 79,450

Table 2.9 • Two Confusion Matrices for Alternative Models with Lift Equal to 2.25

Page 40: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

40A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Unsupervised Model Evaluation

• Evaluating unsupervised data mining is, in general, a more difficult task than supervised evaluation. This is true because the goals of an unsupervised data mining session are frequently not as clear as the goals for supervised learning.

Page 41: Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration

41A/W & Dr. Chen, Data MiningDr. Chen, Data Mining

Data MiningStrategies

SupervisedLearning

Market BasketAnalysis

UnsupervisedClustering

PredictionEstimationClassification