super vector machine(svm) with iris and mushroom dataset

31
Super Vector Machine with Iris and Mushroom Dataset

Upload: pawandeep-kaur

Post on 10-Jun-2015

1.705 views

Category:

Education


4 download

DESCRIPTION

SVM is used to classify the IRIS and Mushroom Dataset.

TRANSCRIPT

Page 1: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Super Vector Machine with Iris and Mushroom Dataset

Page 2: Super Vector Machine(SVM) with Iris and Mushroom Dataset

SVM

• In this presentation, we will be learning the characteristics of SVM by analyzing it with 2 different Datasets

• 1)IRIS• 2)Mushroom• Both will be implementing on WEKA Data

Mining Software

Page 3: Super Vector Machine(SVM) with Iris and Mushroom Dataset

What is SVM?

• Super Vector Machine or Super Vector Network are supervised Learning Model with associated learning algorithm that analyze data and recognize patterns, used for classification and regression analysis.

• The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probablistic binary linear classification

-wikipedia

Page 4: Super Vector Machine(SVM) with Iris and Mushroom Dataset

IRIS and SVM

• IRIS Dataset: The Iris flower data set is a multivariate dataset which quantifies the structural variation of three related species of Iris flower.

• Thus classification is done on the basis of flower species which are:

• Iris-setosa------------------->Blue• Iris-versicolor -----------------> Red• Iris-verginica ------------------> CYAN colour

Page 5: Super Vector Machine(SVM) with Iris and Mushroom Dataset
Page 6: Super Vector Machine(SVM) with Iris and Mushroom Dataset

IRIS and SVM

• The data set consists of 50 samples/ instances from each of three species that totals to 150.

• Four features were measured from each sample• 1) Sepal Length • 2) Petal Length• 3) Sepal Width• 4) Petal Width • -- all in centimetres. • To distinguish between the species linear discriminant model

is used.• Linear  discriminant  analysis  (LDA) are methods used to find a linear combination of features which

characterizes or separates two or more classes of objects or events. (wikepedia)

Page 7: Super Vector Machine(SVM) with Iris and Mushroom Dataset

IRIS and SVM

• So concerning our dataset, as we will be simultaneously

analysing the different behaviour of the four features as mentioned above for the three different species of the Iris flower.

• In IRIS, we will be implementing multi-class SVM model, as there are more than 3 classes.

• We can see from the below image that class 'Iris setosa' is linearly separable and other two classes are not. Thus dataset like Iris is linearly not separable which could be a best example to implement SVM.

Page 8: Super Vector Machine(SVM) with Iris and Mushroom Dataset
Page 9: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Implementation of SVM• The multi-class SVM will be implemented by LIBSVM library. LIBSVM

implements the SMO algorithm for kernelized support vector machines(SVMs), supporting classification and regression. LIBSVM implement one against one strategy for multiclass implementation. LIBSVM to build SVM classes

• The one against one strategy, also known as “pairwise coupling”, “all pairs” or “round robin”, consists in constructing one SVM for each pair of classes. Thus, for a problem with c classes, c(c-1)/2 SVMs are trained to distinguish the samples of one class from the samples of another class. Usually, classification of an unknown pattern is done according to the maximum voting , where each SVM votes for one class. [http://hal.archives-ouvertes.fr/docs/00/10/39/55/PDF/cr102875872670.pdf pp.4]

Page 10: Super Vector Machine(SVM) with Iris and Mushroom Dataset

General Classification of IRIS• Its shown in the histogram that how different feature of each training

example i.e measurements of petal and sepal width and length, classify each example into different classes. The below classification is on the basis of sepal length

Page 11: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Classification-SVM algorithms

• To construct an optimal hyperplane, SVM employs an iterative training algorithm, which is used to minimize an error function. According to the form of the error function, SVM models can be classified into four distinct groups:

• Classification SVM Type 1 (also known as C-SVM classification)• Classification SVM Type 2 (also known as nu-SVM

classification)• [

https://www.statsoft.com/textbook/support-vector-machines]

Page 12: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Testing both algorithms, it was found that C-SVM have better performance over nu-SVM . The MSE and RSE in C-SVM was found as 0.22 and 0.149, whereas the same in nu-SVM was measured as 0.26 and 0.16

Page 13: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Kernal Type. As it is on Multi-classes dataset thus it will be using the kernel trick. There are four kernel functions available for selection

Page 14: Super Vector Machine(SVM) with Iris and Mushroom Dataset

SVM Kernels

• Radial basis kernel function is most popular and most widely used from all. Different Kernel Functions will generate different confusion matrix

• In general, the RBF kernel is a reasonable first choice. This kernel nonlinearly maps samples into a higher dimensional space so it, unlike the linear kernel, can handle the case when the relation between class labels and attributes is nonlinear

Page 15: Super Vector Machine(SVM) with Iris and Mushroom Dataset

SVM Kernels

• With Radial Basis

• With Polynomial Kernel

Page 16: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Testing Iris Dataset via SVM

• Using same training set for test set

• Using different test set from the original training set

• Cross validation method• Percentage Split. if 10%

then it means 10% training data and 90% test data

Page 17: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Cross Validation Technique

Results with 10-Fold Results with 15-Folds

Page 18: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Percentage Split Test Set

50% 70%

Page 19: Super Vector Machine(SVM) with Iris and Mushroom Dataset

ROC Curve for Iris-Setosa

Page 20: Super Vector Machine(SVM) with Iris and Mushroom Dataset

ROC Curve for Iris- Versicolor

Page 21: Super Vector Machine(SVM) with Iris and Mushroom Dataset

ROC Curve for Iris-Virginica

Page 22: Super Vector Machine(SVM) with Iris and Mushroom Dataset

MUSHROOM DATASET

• This dataset is a sample of 23 different species of mushroom, which has the poisonous and edible effect. Thus, the training set will categorize each species in to 2 classes.. Thus it will train the future mushroom samples to fall into either of two categories depends upon its similarity with the other 23 species.

• Total instances we have 8124• In the following picture, Edible is shown in Blue Poisonous is in Red

Page 23: Super Vector Machine(SVM) with Iris and Mushroom Dataset
Page 24: Super Vector Machine(SVM) with Iris and Mushroom Dataset

                              Mushroom and SVM

Following example will show how one of the feature of mushroom when have certain effect out of 9 categories, will classify it into Edible or Poisonous. Like if it smells Fishy i.e 'f' which have a count of 2160 has more probability of being poisonous.

Page 25: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Implementation of SVM

• In this dataset SVM model is used as binary classifier(default) doing linear classification.

• It is implemented by Weka’s default algorithm SMO(Sequential Minimal optimization), which is also used in LibSVM

• This implementation globally replaces all missing values and transforms nominal attributes into binary ones. It also normalizes all attributes by default.

• Linear Binary kernel used k<x,y>=x,y

• As like LibSVM it has different kernel functions. By default it uses PolyKernel pulls out the following result. I did try to implement other kernels but it was too slow to process 8124 instances

Page 26: Super Vector Machine(SVM) with Iris and Mushroom Dataset
Page 27: Super Vector Machine(SVM) with Iris and Mushroom Dataset

As like LibSVM it has different kernel functions. By default it uses PolyKernel that pulls out the following result. I did try to implement other kernels but it was too slow to process 8124 instances

Page 28: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Cross Validation Technique

Results with 10-Fold Results with 90-Folds

Page 29: Super Vector Machine(SVM) with Iris and Mushroom Dataset

Percentage Split Test Set

                  50%                          70%

Page 30: Super Vector Machine(SVM) with Iris and Mushroom Dataset

ROC for Edible Mushroom

Page 31: Super Vector Machine(SVM) with Iris and Mushroom Dataset

ROC for Poisonous Mushroom