k nearest neighbor

Classification is done by relating the unknown to the known according to some distance/similarity function

Stores all available cases and classifies new cases based on similarity measure

Different names

Memory-based reasoning

Example-based reasoning

Instance-based reasoning

Case-based reasoning

Lazy learning

kNN determines the decision boundary locally. Ex. for 1NN we assign each document to the class of its closest neighbor

For kNN we assign each document to the majority class of its closest neighbors where k is a parameter

The rationale of kNN classification is based on contiguity hypothesis, we expect the test document to have the same training label as the training documents located in the local region surrounding the document.

Veronoi tessellation of a set of objects decomposes space into Voronoi cells, where each object’s cell consist of all points that are closer to the object than to other objects.

It partitions the plane to complex polygons, each containing its corresponding document.

Let k=3

P(circle class | star) = 1/3

P(X class | star) = 2/3

P(diamond class | star) = 0

3NN estimate is –

P(circle class | star) = 1/3

1NN estimate is –

P(circle class | star) = 1

3NN preferring X class and 1NN preferring circle class

Advantages

Non-parametric architecture

Simple

Powerful

Requires no training time

Disadvantages

Memory intensive

Classification/estimation is slow

The distance is calculated using Euclidean distance

21 )()( yyxxD

MinMax

MinXX s

21 )()( yyxxD

MinMax

MinXX s

If k=1, select the nearest neighbors

If k>1

For classification, select the most frequent neighbors

For regression, calculate the average of k neighbors

An inductive learning task – use particular facts to make more generalized conclusions

Predictive model based on branching series of Boolean test – these Boolean test are less complex than the one-stage classifier

Its learning from class labeled tuples

Can be used as visual aid to structure and solve sequential problems

Internal node (Non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test and each leaf node holds a class label

If we leave at 10 AM and there are no cars stalled on the road, what will our commute time be?

Leave At

Stall? Accident?

10 AM 9 AM8 AM

Short Medium Long

No Yes No Yes

In this decision tree, we made a series of Boolean decision and followed a corresponding branch –

Did we leave at 10AM?

Did the car stall on road?

Is there an accident on the road?

By answering each of these questions as yes or no, we can come to a conclusion on how long our commute might take

We do not have to represent this tree graphically

We can represent this as a set of rules. However, it may be harder to read

if hour == 8am

commute time = long

else if hour == 9am

if accident == yes

commute time = long

commute time = medium

else if hour == 10am

if stall == yes

commute time = long

commute time = short

The algorithm is called with three parameters – data partition, attribute list, attribute subset selection.

It’s a set of tuples and there associated class label

Attribute list is a list of attributes describing the tuples

Attribute selection method specifies a heuristic procedure for selecting attribute that best discriminates the tuples

Tree starts at node N. if all the tuples in D are of the same class, then node N becomes a leaf and is labelled with that class

Else attribute selection method is used to determine the splitting criteria.

Node N is labelled with splitting criteria, which serves as a test at the node.

The previous experience decision table showed 4 attributes – hour, weather, accident and stall

But the decision tree showed three attributes – hour, attribute and stall

So which attribute is to be kept and which is to be removed?

Methods for selecting attribute shows that weather is not a discriminating attribute

Method – given a number of competing hypothesis, the simplest one is preferable

We will focus on ID3 algorithm

Basic idea

Choose the best attribute to split the remaining instances and make that attribute a decision node

Repeat this process for recursively for each child

Stop when

All attribute have same target attribute value

There are no more attributes

There are no more instances

ID3 splits attributes based on their entropy.

Entropy is a measure of disinformation

Entropy is minimized when all values of target attribute are the same

If we know that the commute time will be short, the entropy=0

Entropy is maximized when there is an equal chance of values for the target attribute (i.e. result is random)

If commute time = short in 3 instances, medium in 3 instances and long in 3 instances, entropy is maximized

Calculation of entropy

Entropy S = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)

S = set of examples

Si = subset of S with value vi under the target attribute

L – size of range of target attribute

If we break down the leaving time to the minute, we might get something like this

Since the entropy is very less for each branch and we have n branches with n leaves. This would not be helpful for predictive modelling

We use a technique called as discretization. We choose cut point such as 9AM for splitting continuous attributes

8:02 AM 10:02 AM8:03 AM 9:09 AM9:05 AM 9:07 AM

Long Medium Short Long Long Short

Consider the attribute commute time

When we split the attribute, we increase the entropy so we don’t have a decision tree with the same number of cut points as leaves

8:00 (L), 8:02 (L), 8:07 (M), 9:00 (S), 9:20 (S), 9:25 (S), 10:00 (S), 10:02 (M)

Binary decision trees

Classification of an input vector is done by traversing the tree beginning at the root node and ending at the leaf

Each node of the tree computes an inequality

Each leaf is assigned to a particular class

Input space is based on one input variable

Each node draws a boundary that can be geometrically interpreted as a hyperplaneperpendicular to the axis

B CYes No

Yes No

BMI<24

They are similar to binary tree

Inequality computed at each node takes on a linear form that may depend on linear variable

aX1+bX2

Yes No

Chi-squared automatic intersection detector(CHAID)

Non-binary decision tree

Decision made at each node is based on single variable, but can result in multiple branches

Continuous variables are grouped into a finite number of bins to create categories

Equal population bins is created for CHAID

Classification and Regression Trees (CART) are binary decision trees which split a single variable at each node

The CART algorithm goes through an exhaustive search of all variables and split values to find the optimal splitting rule for each node.

There is another technique for reducing the number of attributes used in tree –pruning

Two types of pruning

Pre-pruning (forward pruning)

Post-pruning (backward pruning)

Pre-pruning

We decide during the building process, when to stop adding attributes (possibly based on their information gain)

However, this may be problematic – why?

Sometimes, attribute individually do not compute much to a decision, but combined they may have significant impact.

Post-pruning waits until full decision tree has been built and then prunes the attributes.

Two techniques:

Subtree replacement

Subtree raising

Subtree replacement

Node 6 replaced the subtree

May increase accuracy

Entire subtree is raised onto another node

While decision tree classifies quickly, the time taken for building the tree may be higher than any other type of classifier.

Decision tree suffer from problem of error propagation throughout the tree

Since decision trees work by a series of local decision, what happens if one of these decision is wrong?

Every decision from that point on may be wrong

We may return to the correct path of the tree

k nearest neighbor

Data & Analytics

adaptive k-nearest neighbor classiﬁcation based on a

k nearest neighbor algorithm - computer science and...

implementasi algoritma k-nearest neighbor sebagai

prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/1640/2/1497s.pdf ·...

classification of forest areas by k nearest neighbor...

iknn: informative k-nearest neighbor pattern...

comparing image classification methods: k-nearest-neighbor...

classification k-nearest neighbor classifier naïve bayes...

exploiting k-nearest neighbor information with many...

k- nearest neighbor algorithm for improving …

k-nearest neighbor classifier

k-nearest neighbor classiﬁcation over semantically secure...

k-nearest neighbor classiﬁcation over semantically secure...

qrs detection using k-nearest neighbor algorithm

k-nn (k-nearest neighbor)...

k-nearest neighbor categorization problem over ... ·...

k nearest neighbor algorithm - university of minnesota...

probabilistic threshold k nearest neighbor queries over...

decreasing radius k-nearest neighbor search using mapping...

k-nearest neighbor & naive bayes