12 classification

7/28/2019 12 Classification

1/16

Classification Methods Classification problems

Classification Methods

Supported byXLStat

Classification Methods

All of these methods are consideredsupervised learning.

Initial assumptions regardingmembership or properties are madewhen developing a model.

An initial evaluation of the data usingexploratory data analysis is useful.

Data sets

Needed develop and evaluate a classification model.

Training setRepresentative samples used to build the model.

The modeling software uses the class information.

Evaluation set

Samples of known class, used to test the model.

The modeling software does not know the classes.

Test setTrue unknowns.

Data pre-processing

With any of these methods, you may chooseto do some sort of data preprocessing.

Raw

Is fastest.

Scaled

Gives equal weight to the variables.

PCACan be used to reduce noise, insignificantvariables.


2/16

Data pre-processing

With some data sets, you may also want tosome other types of pre-processing.

Example. Spectral or chromatographic traces.

Options may include:

Smoothing, baseline correction, signalaveraging, using the first or secondderivative.

Creating an evaluation set

The evaluation set is typically a sub-set ofthe training set that was omitted whenbuilding a model.

Randomly pick a subset of the data.Random pick members from each class

Any approach that selectively removes aportion of the data could cause bias.

Leave-one-out validation

A standardized approach forvalidation of a model where eachsample serves as an evaluation set.

1. Omit a single sample from the set

2. Build the model

3. Test the omitted sample

4. Repeat the above steps until each sample hasbeen omitted and tested once.

Your data

While Leave-One-Out testing is the best approach, itcan be slow for large sets.

Alternate approaches are to leave two or moresamples out with each pass.

Samples should be randomly listed in the matrix.

The same two (or more) sample should never beomitted together more than once.

Rule building methods

Methods where a set of rules are created to discriminatebetween classes.

Linear learning machineOne or more linear vectors are created todiscriminate between classes.

Discriminate analysisLinear or quadratic equations are used to separateclasses.

Classification treesSeries of rules are used to sequentially classify.

Linear learning machine

The assumption isthat one or morevector can be foundthat can be used todiscriminate betweeour classes.

This can make useof our raw data orwork in PC space.

PC space would bebetter as there woulbe noise reduction.


3/16


For simpleclassifications,there can be manylinear vectors thatgive complete classdiscrimination.

You would select theone that gives thebest partitioning.

You are not limited tojust 1 or 2-D vectors.


As the number of classeincreases, the potentialnumber of usable vectorwill decrease.

The problem can becomcomplex very rapidly.

You can reach a pointwhere simple linear linecan no longer solve theproblem.

Linear learning machineIn this example, alinear solution cant befound that discriminatesbetween the classes.

Clearly, there shouldbe a way todiscriminate - theclasses appear to bewell defined.

A non-linear function

may offer the bestapproach (discriminateanalysis.

Discriminant Analysis (DA)

First descried by Fisher in 1936.

Similar to LLM but can use both quantitative andqualitative variables.

Approach uses linear models when sample classeshave similar covariance matrices.

Uses quadratic models when classes havedissimilar covariance matrices.

Can have problems if you have variables with nullvariance or multicolinearity - must be eliminated.

Iris example

Well return to the Irisexample dataset - usingXLStats built in DAfunction.

Were going to useautoscaled data.

DA with XLStat.


4/16

Petal width

Petal length

Sepal Width

Sepal Length

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

F1 (99.01 %)

F2(0.99%)

1

11

11

1

1

1

1

1

11

1

1

11 1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

11

11

1

11

1

1

1

1

1

1

1

1

1

2

22

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

2

3

3 3

3

3

333

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

33

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

2

1

-4

1

-10 -5 0 5

Coffee example

This consisted of 6 types ofcoffee - identified based on MSdata.

To avoid colinearity and nullvariable problems, PCA scores

were used (first 5 components).

C

C

CC

C

C

C

C

CC

C

C

EEE

EE

E

EEE

EE

E

KK K

KK

K

KK K

KK

K

RR

RR

RR

RR

RR

RR

S

S

SS

S S

S

S

SS

S S

UU

UUU

U

UU

UUU

U

C

E

K

R

S

U

-10

-5

0

5

10

15

-15 -10 -5 0 5 10 15

F1 (56.19 %)

F2(22.7

9%)


5/16

Classification trees

Predicts class membership by sequentialapplication of rules based on predictor variables.

With DA and LLM, you create a set of mathmodels that are all applied at once.

With classification trees, the predictor variablesare evaluated as ordinal rules, one at a time.

Classification trees

Solid - liquid

Density > 1 Red or green

Density > 1

Iris example (yet again!)

XLStat supports the use ofclassification and regression trees.

Classification if the Y variable (class)is qualitative, regression if the Yvariable is quantitative.

The iris example is a classificationexample.

Iris example

= If Petal width is between 1 and 8the assign to Species 1

Node: 1

Size: 150

%: 100

Purity(%):

.

50

50

501

2

3

Node: 2

Size: 50

%: 33.3

Purity(%):

[1, 8[0

0

501

2

3

Node: 3

Size: 100

%: 66.7

Purity(%):

[8, 25[50

50

01

2

3

Node: 4

Size: 53

%: 35.3

Purity(%):

.

[8, 16.5[5

48

01

2

3

Node: 6

Size: 46

%: 30.7

Purity(%):

97.

[30,

50.51

45

01

2

3

Node: 8

Size: 41

%: 27.3

Purity(%):

[30,

47.50

41

01

2

3

Node: 9

Size: 5

%: 3.3

Purity(%):

[47.5,

50.51

4

01

2

3

Node: 10

Size: 2

%: 1.3

Purity(%):

[22,

23.51

1

01

2

3

Node: 11

Size: 3

%: 2

Purity(%):

[23.5,

310

3

01

2

3

Sepal Width

Petal length

Node: 7

Size: 7

%: 4.7

Purity(%):

7.

[50.5,

584

3

01

2

3

Node: 12

Size: 3

%: 2

Purity(%):

.

[60,

62.51

2

01

2

3

Node: 13

Size: 4

%: 2.7

Purity(%):

[62.5,

723

1

01

2

3

Sepal Length

Petal length

Node: 5

Size: 47

%: 31.3

Purity(%):

.

[16.5,

2545

2

01

2

3

Node: 14

Size: 10

%: 6.7

Purity(%):

[45,

50.58

2

01

2

3

Node: 15

Size: 37

%: 24.7

Purity(%):

[50.5,

69

1

2

3

Petal lengt

Petal width

Petal width


6/16

i

i

Node: 3

Size: 100

%: 66.7

Purity(%):

[8, 25[50

50

01

2

3

Node: 4

Size: 53

%: 35.3

Purity(%):

[8, 16.5[5

48

01

2

3

1

45

Node: 9

Size: 5

%: 3.3

Purity(%):

[47.5,1

4

01

2

3

i

i

length

Node: 7

Size: 7

%: 4.7

Purity(%):

[50.5,4

3

01

2

3

Node: 12

Size: 3

%: 2

Purity(%):

[60,1

2

01

2

3

Node: 13

Size: 4

%: 2.7

Purity(%):

[62.5,3

1

01

2

3

Sepal Length

Petal length

Node: 5

Size: 47

%: 31.3

Purity(%):

[16.5,45

2

01

2

3

Node: 14

Size: 10

%: 6.7

Purity(%):

[45,8

2

01

2

3

Node: 15

Size: 37

%: 24.7

Purity(%):

[50.5,37

0

01

2

3

Petal length

Petal width

Purity is just the percent ofsamples assigned to that node.

Using Classification Tree

Using DA

Wine exampleRiesling vs. Chardonnay.

Ohio vs. California.

Assayed 5 organic and 4 tracemetal components.

Yes, youll do the same with yourhomework.

Node Class Freq. Purity Rules

1 CaC 17 41.46%

2 CaC 17 58.62%If Ca in [17.5, 60.75[ then Class = CaC in 58.6% ocases

3 CaR 7 58.33%If Ca in [60.75, 94.75[ then Class = CaR in 58.3%of cases

4 CaR 3 60.00%If 2,3-butanediol in [0, 0.065[ and Ca in [17.5,60.75[ then Class = CaR in 60% of cases

5 CaC 17 70.83%If 2,3-butanediol in [0.065, 0.514[ and Ca in [17.560.75[ then Class = CaC in 70.8% of cases

6 CaC 14 100.00%If Mn in [0.82, 1.625[ and 2,3-butanediol in [0.060.514[ and Ca in [17.5, 60.75[ then Class = CaC i

100% of cases

7 OhC 7 70.00%If Mn in [1.625, 3.51[ and 2,3-butanediol in [0.060.514[ and Ca in [17.5, 60.75[ then Class = OhC i70% of cases

8 CaC 3 60.00%If K in [735.5, 881.75[ and Mn in [1.625, 3.51[ and2,3-butanediol in [0.065, 0.514[ and Ca in [17.5,60.75[ then Class = CaC in 60% of cases

9 OhC 5 100.00%If K in [881.75, 1147.5[ and Mn in [1.625, 3.51[ and2,3-butanediol in [0.065, 0.514[ and Ca in [17.5,

60.75[ then Class = OhC in 100% of cases

10 OhC 2 100.00%

If 1-hexanol in [0.638, 0.723[ and K in [735.5,881.75[ and Mn in [1.625, 3.51[ and 2,3-butanediolin [0.065, 0.514[ and Ca in [17.5, 60.75[ then Class= OhC in 100% of cases

11 CaC 3 100.00%

If 1-hexanol in [0.723, 1.056[ and K in [735.5,881.75[ and Mn in [1.625, 3.51[ and 2,3-butanediolin [0.065, 0.514[ and Ca in [17.5, 60.75[ then Class= CaC in 100% of cases

12 OhR 5 83.33%If 1-hexanol in [0.409, 0.673[ and Ca in [60.75,94.75[ then Class = OhR in 83.3% of cases

13 CaR 6 100.00%If 1-hexanol in [0.673, 1.218[ and Ca in [60.75,94.75[ then Class = CaR in 100% of cases

Node: 1

Size: 41

%: 100

Purity(%):

6

8

10

17CaC

CaR

OhC

OhR

Node: 2

Size: 29

%: 70.7

Purity(%):

[17.5,

.1

8

3

17CaC

CaR

OhC

OhR

Node: 4

Size: 5

%: 12.2

Purity(%):

[0,

.1

1

3

0CaC

CaR

OhC

OhR

Node: 5

Size: 24

%: 58.5

Purity(%):

[0.065,

.0

7

0

17CaC

CaR

OhC

OhR

Node: 6

Size: 14

%: 34.1

Purity(%):

[0.82,

.0

0

0

14CaC

CaR

OhC

OhR

Node: 7

Size: 10

%: 24.4

Purity(%):

[1.625,

.0

7

0

3CaC

CaR

OhC

OhR

Node: 9

Size: 5

%: 12.2

Purity(%):

[881.75,

.0

5

0

0CaC

CaR

OhC

OhR

Node: 8

Size: 5

%: 12.2

Purity(%):

[735.5,

.0

2

0

3CaC

CaR

OhC

OhR

Node: 10

Size: 2

%: 4.9

Purity(%):

[0.638,

.0

2

0

0CaC

CaR

OhC

OhR

Node: 11

Size: 3

%: 7.3

Purity(%):

[0.723,

.0

0

0

3CaC

CaR

OhC

OhR

1-hexanol

K

Mn

2,3-butanediol

Node: 3

Size: 12

%: 29.3

Purity(%):

[60.75,

.5

0

7

0CaC

CaR

OhC

OhR

Node: 12

Size: 6

%: 14.6

Purity(%):

[0.409,

.5

0

1

0CaC

CaR

OhC

OhR

Node: 13

Size: 6

%: 14.6

Purity(%):

[0.673,

.

Ca

Ca

Oh

Oh

1-hexanol

Ca


7/16

Confusion matriix for the estiimation sam le:

from \ to CaC CaR OhC OhR Total % correct

CaC 17 0 0 0 17 100.0%

CaR 0 9 0 1 10 90.0%

OhC 0 1 7 0 8 87.5%

OhR 0 1 0 5 6 83.3%

Total 17 11 7 6 41 92.7%

K nearest neighbor classification

A similarity-based classification method.

It attempts to assign categories to unknownsamples based on multivariate proximity to othersamples.

It works best with discrete classification types andis tolerant of poor data sets.

K - ! The number of closest neighbors beingcompared.

Consider this as the supervised version of HCA.

K nearest neighbor classificationIn its simplest form, KNN is conducted by:

First, a training set is collected thatcontains examples of each class.

Intersample distances are thencalculated.

"

whereN = # of variables or components used.

da " b= aj-bb^ hj=1

N

!2

KNN

The distance matrix is sorted and thedistance of the unknown sample can becompared to:

1. The K nearest neighbors

2. The nearest class cluster.

Option 2 requires that K = 1.

KNN

When using the distance to a class, you

can use the same link options that werediscussed earlier.

The distance can be based on:

Single link - closest member of class.

Complete link - farthest member of class.

Centroid - center of class cluster.

KNN - single link

In this example, the

unknown is compa

to the 3 closest

known samples.

In this case, the thr

closest samples ar

all red.

Single link

K = 3


8/16

KNN - centroid link

Centroid link

With this approach,

the distance to the

center of a class

cluster is determined

and compared.

KNN

Ideally, if a test sample falls well within a known class,its closes neighbors should all be of one class.

Here, all of the

blue sampleswould be closerto the unknownthan any of thegreen.

Mycobacteria - HCA

47474747474747474747474747474747474747474543434242424242424242424242424242424242424243434343434345454545454545454545454545454543434343434345454545454545454545454545454545454343434343434343434344434444444444444444444447444444444446464649494949494949494949494949494949494946464646464646464646464646464646464646464646464646464646464646464646464646464646

0

100

200

300

400

500

600

700

800

900

1000

A quick review of ALL of the waysthat this data set was difficult to getuseful information from.

Mycobacteria - k means

Mycobacteria - PCA

-3.000

-2.000

-1.000

0.000

1.000

2.000

3.000

4.000

-6.000 -4.000 -2.000 0.000 2.000 4.000 6.000 8.000 10.000

42

43

44

45

46

47

49

Mycobacteria - DA


9/16

Mycobacteria - DA

4242

42

42

42

4242

4242

42

4242

42

4242

42

42

42

42

434343

43 43

4343

43

43

43

434343

4343

43

43

43

4343

434343

43

43 444444

4444

4444

44

444444

444444

4444

45

454545454545

45

4545

454545454545

45

4545

45

45

45

45

45

45

45

45

4545

45 4545

45

46

46

46

4646 46

46

4646

4646

4646

46

4646

4646

4646

46

46

46

46

46

4646

4646 4646

4646

4646

46

46

4646 4646

46

46

47

474747

47

47

47

47

47

47

47

47

47

4747

47

4747474747

49

49

49

4949

4949

49

49

494949

494949

4949

49

49

47

46

45

44

4342

-15

-10

-5

0

5

10

-25 -20 -15 -10 -5 0 5

F1 (56.45 %)

F2(29.6

3%)

Mycobacteria - DA

2

2

2

2

2

22

22

2

222

22

2

2

2

2

434343

4343

4343

43

43

43

434343

43

43

43

43

43

43

43

43

43 43

43

4344

4444

4444

44

44

44

444444

444444

44

44

45

454545454545

45

4545

454545

45 4545

45

4545

45

45

5

5

45

45

45

45

4545

5 45

4545

4664646

6 466 464666 4646 4666 464666 46

46

464646 46464646 46464646 4646466 466 46 4666

47

474

47

47

47

47

47

47

47

47

47

47

447

47

4747 47

47

47

49

49

49

49

49

49

49

49

49

4949

49

49 49

49

49

49

49

49

47

46

45

44

43

2

-5

0

5

10

0 5

F1 (56.45 %)

F2(29.6

3%)

Mycobacteria - DA Getting out the voteWhat if a samples distances is such that itcould be in more than one class?

When you havemore than onepossible class,we can take avote. The classwith the mostvotes wins.

K = 5

Getting out the vote Getting out the vote

Example - K = 3

Sample Class Distance1 A 0.134

2 B 0.145

3 A 0.158

Here you would end up with 2 votes for A

and one for B - A would win and the

distances would be smaller.


10/16

Getting out the vote

Example - K = 5

Sample Class Distance

1 A 0.1342 B 0.145

3 A 0.1584 B 0.234

5 C 0.502

Here, A and B would tie. The tie-breaker wouldbe that A averages a smaller distance so

would be made the winner.

KNN validation

The optimum number for K can be foundby trial an error but for a close match, itshould make no difference.

The classifying power of your data can beevaluated by leave one out validation ofyour training set.

This should be done before any sort ofreal classification begins.

KNN validationValidation

You can sequentially leave out each of yoursamples and test it for votes at several Kvalues.

You end up with a vote matrix that will tell youthe optimum K value for each class.

You will also get a misclassification matrix "-this tells you how often one of your knownsare incorrectly classified.

K nearest neighbor classification

So KNN will always assign a class.

What if you have a material that is not a member ofan existing class?

One option is to set a maximum distance.

Example

Your intraclass distances run about 0.2 for all ofyour classes, you might want to omit votes withdistances that exceed 0.2.

Iris (of course) The Iris data set is included with a demo of the

program Pirouette.

Well be using the Pirouette demo to show howto conduct KNN and SIMCA classifications.

You can download a copy of the demo fromwww.infometrix.com.

The demo is fully functional but only with thedata sets that are provided by Infometrix.

The actual software is pretty easy to use buttoo expensive for our use in the course.

Iris example


11/16

Iris - scores by class Iris - voting results

Iris - class partitions

What? NOT the Iris data set?

Headspace MS of 4 cola classes.

Two cola brands.

Diet and regular.

m/e 44 - 149.

May need to preprocess toeliminate any nonvariant data.

Cola example

Class

1 Brand 1

2 Diet brand 1

3 Brand 2

4 Diet brand 2

PCA scores


12/16

PCA scores PCA loadings

KNN classificationNot a bad job! KNN classifications

SIMCASoft Independent Modeling ofClass Analogy

A method of classification that provides:Detection of outliers.

Estimates of confidence for aclassification.

Determination of potentialmembership in more than a singleclass.

SIMCABasic approach.

For each class of samples, a PCA model isconstructed.

This model is based on the optimum numberof components that best clusters an individuaclass.

The optimum number of components can varyfrom class to class and can be determined bycross-validation


13/16

SIMCA modelsSince the numberof componentsused can vary,each class will bebest describedby its own

hypervolume.

SIMCA modelsLimitation of a class hypervolume.

You can limit the size of a hypervolumeby setting a standard deviation cutoff.This results in better defined classes.

SIMCA modelsOnce a model has been created for eachclass, you are ready to classify unknowns.

For each model/sample combination:

+ The sample is transformed into PC spaceand compared to see if is a likely classmember.

+ If it is within the hypervolume of a singleclass, you have a match.

SIMCA classificationThe potential still exists for a sample to beclassified as a member of more than one class.

It may also notbe a member oany known clas

SIMCA classification

SIMCA will give you an estimate as to

the probability of class membership.

Example - two possible classes." " Probability" Class A" " 0.90" Class B 0.45Here, the sample is more likely to bea member of Class A.

SIMCA summary

Of the methods covered, SIMCA offers the

most options for developing a classificationmodel when the classes are well known.

It also requires the most development time asyou must determine the optimum modelconditions for each class.

If used, plan on spending quite a bit of timeworking with all of the available options.


14/16

SIMCA example - Iris.Of course well look at the iris dataset again.

Note: We have aseparate model foreach class in the datset - in this case three

SIMCA example - Iris.

Pirouette willprovide anestimate as to theclass hypervolumesbased on the firstthree PCs.

SIMCA example - Iris. SIMCA example - Iris.

It appears thatpetal length isthe most usefulfor classifying.

SIMCA example - Iris.These plots show the relativepositions of each sample whenprojected into any of the three classmodels - two classes at a time - with

color coding based on known class


15/16

Cola exampleWith the cola example (two brands, diet andregular), we have 4 classes.

Here you can see that the classes are pretty wellresolved.

Cola example

Mycobacteria againThis data set is included with the Pirouette demo.

File = Mycosing.wks

It is a subset of the version Ive been using(only 72 samples)

Mycobacteria SIMCA

Perfect classifications - a first for this dataset.


16/16

Mycobacteria SIMCA

DiscriminatingPower is a

measure of whichvariables showthe biggest class

differences.

Mycobacteria SIMCAExample shows that a differentnumber of components wereused in developing theindividual SIMCAhypervolumes.

Mycobacteria SIMCAModeling power

indicates the relativeimportance of each

variable forclassification.

Loadings, as always show the

relative significance of eachvariable in constructing each PC

There are relatively unimportant.

Mycobacteria SIMCAPC plots are pretty boring since you only have one class. However, it canbe used to see if you have any sub-classes.

Outliers are test for by plotting sample residuals (dif ference betweensample and center of hypervolume) vs its Mahalanobis distance from thecenter of the cluster - similar to a Euclidian distance but takes into accountcorrelations of the data and is scale invariant.

Mycobacteria SIMCA

12 classification

Documents