chapter 6. classification and prediction

Chapter 6. Classification and Prediction

Eager Learners: when given a set of training tuples,

will construct a generalization model before receiving

new tuples to classify

Classification by decision tree induction

Rule-based classification

Classification by back propagation

Support Vector Machines (SVM)

Associative classification

Lazy vs. Eager Learning

Lazy vs. eager learning

Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple

Eager learning (the above discussed methods): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify

Lazy: less time in training but more time in predicting

Lazy Learner: Instance-Based Methods

Typical approaches

k-nearest neighbor approach

Instances represented as points in a Euclidean space.

The k-Nearest Neighbor Algorithm

All instances correspond to points in the n-D space

The nearest neighbor are defined in terms of Euclidean distance, dist(X1, X2)

Target function could be discrete- or real- valued

For discrete-valued, k-NN returns the most common value among the k training examples nearest to xq

k-NN for real-valued prediction for a given unknown tuple

Returns the mean values of the k nearest neighbors

Distance-weighted nearest neighbor algorithm

Weight the contribution of each of the k neighbors according to

their distance to the query xq

Give greater weight to closer neighbors

Robust to noisy data by averaging k-nearest neighbors

How can I determine the value of k, the number of neighbors?

In general, the larger the number of training tuples is, the larger the value of k is

Nearest-neighbor classifiers can be extremely slow when classifying test tuples O(n)

By simple presorting and arranging the stored tuples into search tree, the number of comparisons can be reduced to O(logN)

Example:

Distance Measure

Distance Measure Example

WINE CHERRY CHEWY

TANNINS

BEAUTY

WINE1 1 1 1

WINE2 0 0 1

What is the prediction if k=1? K=3? K=5?

WINE blueberry blackberry CHERRY CHEWY

TANNINS

Wine1 1 0 0 1 90+

Wine2 1 1 1 0 90+

Wine3 0 1 0 1 90+

Wine4 0 1 1 1 90-

Wine5 0 0 1 0 90-

Wine6 0 0 1 1 90-

Unknown 1 1 0 1 ?

chapter 6. classification and prediction - rizal setya · pdf filechapter 6. classification...

Documents

chapter 7. classification and prediction

data mining: classification. classification and prediction...

scop –protein structure classification cath –protein...

classification and prediction (cont.) pertemuan 09

prediction and classification in nonlinear data …

classification / prediction

data mining: concepts and techniques - rizal setya … ·...

september 15, 2008data mining: concepts and techniques1...

machine learning application for classification prediction

daily pattern prediction based classification modeling...

automatic classification and prediction models for early...

more on classification and prediction

gecnilokheri.ac.ingecnilokheri.ac.in/gpcontent/data...

prediction (classification, regression)

classification and prediction...

lecture 7 classification itcs 6163. chapter 7....

classification: decision trees · prediction:...

dm ch6(classification and prediction)

technostress explanation by ananti setya