classification dr eamonn keogh computer science & engineering department university of...

Post on 20-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ClassificationClassification

Dr Eamonn KeoghDr Eamonn KeoghComputer Science & Engineering Department

University of California - RiversideRiverside,CA 92521eamonn@cs.ucr.edu

Who is smarter, Humans or Pigeons?

Section 1.1 (again)

Section 4.1 Section 4.3

Read in Detail

Section 4.2.2

Section 4.34

Glance over

Examples of class A Examples of class B 1) What class is this object?

2) What class is this object?

1

2

3

4

1

2

3

4

Examples of class A Examples of class B 1) What class is this object?

2) What class is this object?

1

2

3

4

1

2

3

4

Examples of class A Examples of class B 1) What class is this object?

2) What class is this object?

1

2

3

4

1

2

3

4

The “game” we have just been playing is Supervised Classification.

Why is it useful?

Examples of class APeople who contracted

disease X.

Examples of class BPeople who are disease free.

1) What class is this person?Is this person at risk of getting the disease?

2) What class is this person?Is this person at risk of getting the disease?

1

2

3

4

1

2

3

4

Patient temperature 99Blood count 4214Weight 167

Patient temperature 98Blood count 3214Weight 179

Patient temperature 97Blood count 2763Weight 121

Patient temperature 99Blood count 3234Weight 117

Patient temperature 97Blood count 0012Weight 190

Patient temperature 99Blood count 0114Weight 202

Patient temperature 98Blood count 1014Weight 345

Patient temperature 99Blood count 1214Weight 190

Patient temperature 97Blood count 0118Weight 280

Patient temperature 99Blood count 3452Weight 99

Examples of class A Examples of class B 1) What class is this object?

2) What class is this object?

1

2

3

4

1

2

3

4

Examples of class A Examples of class B

1

2

3

4

1

2

3

4

3 4

1.5 5

6 8

2.5 5

5 2.5

5 2

8 3

4.5 3

1) What class is this object?

2) What class is this object?

8 1.5

4.5 7

ClassificationClassification

There are many classification algorithms, in this class we will consider only…

• Simple Linear Classifier. • Nearest Neighbor Classifier. • Decision Tree.• Naïve Bayes.

The classification problemThe classification problem• The classification algorithm is shown a number of labeled examples from the problem domain of interest. (this collection of labeled data is called the training set).

• The algorithm builds a model that “explains” the labeling of the examples. (this model may or may not be accessible to humans, depending on the algorithm).

• At some future time the algorithm is shown an unlabeled example, and asked to classify it.

Examples of class A Examples of class B 1) What class isthis object?

2) What class isthis object?

1

2

3

4

1

2

3

4

Examples of class A Examples of class B 1) What class isthis object?

2) What class isthis object?

1

2

3

4

1

2

3

4

Shape Domain Cat Domain

Class: Income Savings Num_credit_cards Is_married A: 123,000 34,100 0 NB: 24,000 -2,000 13 YA: 45,200 12,100 3 N… ….. …… … …

… ….. …… … …B: 423,020 23,440 0 NB: 14,000 87,000 0 YA: 11,200 -2,000 2 Y

Sample dataset for a credit worthiness problem

? 123,000 34,100 0 N

What is this instances class?

Number of rows is the size of the training set, number of columns is the dimensionality of the training set, each row is called an instance (or exemplar) each column is called a feature.

Visualizing classification algorithms

We can visualize some classification algorithms in 2D…

Warning: This tends to make the problem look easy...

Examples of class A Examples of class B 1) What class isthis object?

2) What class isthis object?

1

2

3

4

1

2

3

4

3 4

1.5 5

6 8

2.5 5

5 2.5

5 2

8 3

4.5 3

8 1.5

4.5 7

10

1 2 3 4 5 6 7 8 9 10

123456789

Class feature 1 feature 2height1 height2

A 3 4B 5 2.5A 1.5 5… … ...

10

1 2 3 4 5 6 7 8 9 10

123456789

1) What class isthis object?

2) What class isthis object?

8 1.5

4.5 7

A trivial machine learning example represented in 2D Euclidean Space. The blue circles and red squares represent the two classes in our training data, and the black shapes are the objects we are trying to classify.

From now on we will only consider the 2D plots when explaining algorithms and problems. We should always remember that this plots are representations of real world objects.

Simple Linear Classifier A dataset which is not linearly separable

Piecewise Linear Classifier Simple Quadratic Classifier (or some other function)

1) What class isthis object?

2) What class isthis object?

8 1.5

4.5 7

This example is one for which we know a perfect rule, “above the diagonal is circle class, below the diagonal is square class”. (Don’t forget that for real world problems we can never know a perfect rule, even if there is one).

What happens if we learn a piecewise linear classifier or a quadratic classifier on this dataset with small training dataset?

This problem is called overfitting.

Piecewise Linear Classifier

Simple Quadratic Classifier

The Nearest Neighbor Algorithm

The nearest neighbor algorithm (NN) works by projecting the item to be classified into the same space as the training data, then finding the labeled exemplar which is closest. Whatever class that nearest neighbor is, is then assigned to the item to be classified.

In this example, the item (6, 2) is correctly classified.

In spite of its amazing simplicity, Nearest Neighbor is one of the best algorithms for many problems.

We can use many different distance measures to measure the distance between objects. Typically Euclidean distance is used.

Evaluation of Classification

• Leaving one out

• Cross fold validation

Discussion of Nearest Neighbor I

• It is sensitive to irrelevant features. One possible solution is search for good subsets.

• It is sensitive to noise. One possible solution is use KNN.

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

123456789

10

1 2 3 4 5 6 7 8 9 10

Suppose there is a disease. Although we don’t know this, it happens that if your blood sugar is over 5.5 you have the disease and below you don’t….

Discussion of Nearest Neighbor II

• It is sensitive to the units in which the features are measured. One possible solution is to normalize the features.

X axis measured in feetY axis measure in dollars

X axis measured in inchesY axis measure in dollars

Discussion of Nearest Neighbor III

• Scalability

A Famous Problem.

R. A. Fisher’s Iris Dataset.

3 classes

50 of each class

The task is to classify Iris plants into one of 3 varieties using the Petal Length and Petal Width.

Iris Setosa Iris Versicolor Iris Virginica

top related