chapter 6. classification and prediction - rizal setya · pdf filechapter 6. classification...

Post on 07-Feb-2018

233 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 6. Classification and Prediction

Eager Learners: when given a set of training tuples,

will construct a generalization model before receiving

new tuples to classify

Classification by decision tree induction

Rule-based classification

Classification by back propagation

Support Vector Machines (SVM)

Associative classification

Lazy vs. Eager Learning

Lazy vs. eager learning

Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple

Eager learning (the above discussed methods): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify

Lazy: less time in training but more time in predicting

Lazy Learner: Instance-Based Methods

Typical approaches

k-nearest neighbor approach

Instances represented as points in a Euclidean space.

The k-Nearest Neighbor Algorithm

All instances correspond to points in the n-D space

The nearest neighbor are defined in terms of Euclidean distance, dist(X1, X2)

Target function could be discrete- or real- valued

For discrete-valued, k-NN returns the most common value among the k training examples nearest to xq

.

_+

_ xq

+

_ _

+

_

_

+

The k-Nearest Neighbor Algorithm

k-NN for real-valued prediction for a given unknown tuple

Returns the mean values of the k nearest neighbors

Distance-weighted nearest neighbor algorithm

Weight the contribution of each of the k neighbors according to

their distance to the query xq

Give greater weight to closer neighbors

Robust to noisy data by averaging k-nearest neighbors

The k-Nearest Neighbor Algorithm

How can I determine the value of k, the number of neighbors?

In general, the larger the number of training tuples is, the larger the value of k is

Nearest-neighbor classifiers can be extremely slow when classifying test tuples O(n)

By simple presorting and arranging the stored tuples into search tree, the number of comparisons can be reduced to O(logN)

The k-Nearest Neighbor Algorithm

Example:

K=5

Distance Measure

Distance Measure Example

WINE CHERRY CHEWY

TANNINS

BEAUTY

WINE1 1 1 1

WINE2 0 0 1

What is the prediction if k=1? K=3? K=5?

WINE blueberry blackberry CHERRY CHEWY

TANNINS

Score

Wine1 1 0 0 1 90+

Wine2 1 1 1 0 90+

Wine3 0 1 0 1 90+

Wine4 0 1 1 1 90-

Wine5 0 0 1 0 90-

Wine6 0 0 1 1 90-

Unknown 1 1 0 1 ?

top related