Transcript
Page 1: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

An Improved Algorithm for Decision-Tree-Based SVM

Sindhu KuchipudiINSTRUCTOR Dr.DONGCHUL KIM

Page 2: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

OUTLINE:

IntroductionDecision-tree-based SVM.The class separability Measure in feature

space.The Improved Algorithm For Decision-tree-

Based SVM.Experiments And Results.Conclusion

Page 3: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

INTRODUCTION:

• Decision-tree-based support vector machine which combines support vector machines and decision tree is an effective way for solving multi-class problems.

• Support vector machines(SVM) are the classifiers which were originally designed for binary classification.

• Distance measures such as the Euclidean distance and the Mahalanobis distance are often used as separability measures.

Page 4: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

Decision-tree-based SVM:

• Decision-tree-based SVM for multi-class problem can resolve the existence of unclassifiable regions and has higher generalization ability than conventional method.

• Different tree structure corresponds to different division of feature structure and the classification performance of the classifier is closely related to the tree structure.

Page 5: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

a)The example of the division of feature space

b)Expression by decision tree

Example:

Page 6: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

Example:

a)The example of the division of feature space

b)Expression by decision tree

Page 7: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

THE CLASS SEPARABILITY MEASURE IN FEATURE SPACE:

• The Euclidean distance is commonly used as the separability measure.

• Euclidean distance between centers of the two classes can not always denote the separability between classes rightly.

Page 8: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

Example:

The comparison of separability among classes with equal center distance.

• The Euclidean distances among the centers of the three classes are the same, but it is obviously that class k can be classified more easily than that other classes. Therefore the distribution of classes is also an important factor of the between classes separability measure.

Page 9: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

For a problem with k-classes, Suppose Xi , i =1,...,k are sets of training data included in class i.smij be the separability measure between class i and class j.

Where dij is the Euclidean distance between the centre of the class i and class j.i,j = 1 ….. ,k , dij = ||ci – cj||.Ci is the centre of class i based on training sample.

Page 10: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

• ni is the sample number of class i• σi is the class variance,

• It is an index of the class distribution.If smij ≥ 1 , then there is no overlap between class i and class

j • If smij < 1 there is overlap between class i and class j• From the formula smij ,we can say that bigger the smij the

more easily separated between class i and class j.

Page 11: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

• Let the separability measure of class i be smi, it can be defined as the minimum of the separability measure between class i and the others.

• The separability measure of class i indicates the separability of class i from the others.

• The most easily separated class is the class with the maximum separability measure:

Page 12: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

• The above separability measure smij is defined in input space. To get better separability the input space is mapped into the high-dimensional feature space .

• Suppose Φ is the mapping , the feature space is H and the kernel function is k(.,.).

• For input sample x1 and x2 ,Φ map them into feature space H , then the Euclidean distance between x1 and x2 in feature space H is :

Page 13: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

• In the feature space H, suppose mΦ is the class • centre and

• Where n is the number of samples within class.• Suppose {x1,x2,…xn1} and {x1’,x2’,… xn2’} are the

training samples for two classes, Φ map them into feature space H, mΦ and m’Φ are the class centers in feature space H.

• Let dH (mΦ,m’Φ) be the distance between mΦ and m’Φ in feature space then,

Page 14: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

• For t e training samples {x1,x2,….xn} of a given class, let dh(x, mΦ) be the distance between training sample x and class center mΦ in feature space H, then

• Therefore,the separability measure between class i and j in feature space H can be defined as

• Where I is the class variance in feature space.• The newly defined separability measure will be used in the

formation of the decision tree.

Page 15: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

The Improved Algorithm For Decision-tree- Based SVM:

• Suppose one class is separated from the remaining classes at hyper plane corresponding to each node of the decision tree.

• For a problem with k-classes the number of hyper planes to be calculated is k-1. i.e. the decision tree has k-1 nodes except the leaf nodes.

Page 16: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

[algorithm : Improved decision tree based SVM]Suppose Xi, i=1,….k are sets of training data included in class i,

they constitute the set of active training data X.,Step 1: calculate the separabillity measure in feature space smij

i,j=1…k, the smij constitute a matrix of separability measuresStep 2 : select the most easily separated class io.

io= arg max smih where smih is the separability measure of class i

Step 3:Using Xi0 and X- Xi0 as the training data set, calculate a hyperplane fi0,j0.

Step 4:Update the set of active training data X.X<- X-Xi0, t<- t -1Step 5: If t>1,go to step 2;else end.

Page 17: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

EXPERIMENTS AND RESULTS

• To evaluate effectiveness and the performance improvement of the improved algorithm for decision-tree based SVM .

• Experiments for the– Spiral data.– Wine data set.

Page 18: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

Experiment for spiral data:• Recognizing the two or three spiral data is a difficult task for

many pattern recognition approaches since spiral data is highly non-linear.

• The synthetic 2D three-spiral data set has been used in our classification experiments. each spiral line belongs to different class.

• The synthetic 2D spiral can be expressed as parametric equation.

• Where k and α are constant ,θ is radian and variable

Page 19: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

• There are 720 data points samples altogether, and 240 data points for each spiral.

Three-spiral in threecycles

• The training of SVM is under the same condition. c=1000, the Gaussian kernel functions with same kernel size σ are used respectively.

Page 20: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

• Classification results for the synthetic three-spiral data set prove the performance improvement of the improved decision tree –based SVM.

Page 21: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

Experiments for wine data set:

• Wine data set from UCI repository consist of 178 samples of 3 class, 59 in class1 71 in class2 48 in class each sample has 13 attributes.

• The training of SVMs is under the same condition the Gaussian kernel functions with the same kernel sizeσ are used, the kernel size σ changes from 5, 40 to 90.

Page 22: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

• Classification results for this data set also prove the performance improvement of the improved algorithm for decision-tree-based SVM.

Page 23: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

CONCLUSION :

• In this paper we discussed decision-tree based SVM and the separability measure between classes based on the distribution of classes.

• In order to improve the generalization ability of SVM decision tree, a novel separability measure is given based on the distribution of the training samples in the feature space.

• Based on idea experiments for different data sets prove the performance improvement of the improved algorithm for decision-tree based SVM.

Page 24: An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM

THANK YOU


Top Related