using svms for scientists and engineers - prt blog
TRANSCRIPT
-
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
1/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/
Using SVMs for Scientists and EngineersJul 24th, 2013
In the mid-90s, support-vector machines became extremely popular machine learning algorithms due to
a number of very nice properties, and because they can also acheive state-of-the-art performance on a
number of data sets. Although the statistical underpinnings of why SVMs work rely on somewhat
abstract statistical theory, modern statistical packages (like libSVM, and the PRT) make training andusing SVMs almost trivial for the average engineer That said, getting good performance out of an SVM
is often not as easy as simply running pre-existing code on your data, and for some data-sets, SVM
classification may not be appropriate.
This blog entry will serve two purposes - 1) to provide an introduction to practical issues you (as an
engineer or scientist) may encounter when using an SVM on your data, and 2) to be the first in a series
of similar for Engineers& Scientists posts dedicated to helping engineers understand the tradeoffs
and assumptions, and practical details of using various machine learning approaches on their data.
ContentsQuick Notes
SVM Formulation
Appropriate Data Sets
SVM Parameters & Notes
Parameter: Cost (Scalar)
Parameter: Relative Class Error Weights
Parameter: Kernel Choice & Associated Parameters
SVM Pre-Prccessing
Optimizing Parameters
Some Rules-Of-ThumbConcluding
Quick NotesThoughtout this post, well be using prtClassLibSvm, which is built directly on top of the fantastic LibSVM
library, available here:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
The parameter nomenclature were using matches theirs pretty closely, so feel free to leverage their
Get Updates: By RSSPRT BlogMATLAB Pattern Recognition Open Free and Easy
http://-/?-http://-/?-http://-/?-https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theoryhttp://newfolder.github.io/atom.xmlhttp://www.csie.ntu.edu.tw/~cjlin/libsvm/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theory -
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
2/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 2
documentation as well.
SVM FormulationTypical SVM formulations assume that you have a set of n-dimensional real training vectors, {x_i} for i =
1N, and corresponding labels {y_i}, y_i \in {-1,1}. Let x_ik represent the kth element of the vector x_i.
Also assume that you have a relevant kernel function (https://en.wikipedia.org/wiki/Kernel_methods), P,
which takes two input arguments, both n-dimensional real vectors, and outputs a scalar metric -
P(x_i,x_j) = z_ij. The most common choice of P is a radial basis function
(http://en.wikipedia.org/wiki/Radial_basis_function): P(x_i,x_j) = exp(- (\sum_{k} (x_ik-x_jk)^2 )/s^2 )
SVMs perform prediction of new labels by calculating:
f(x) = \hat{y} = ( \sum_{i} (w_i*P(x_i,x) - b) ) > 0
e.g., the SVM learns a representation for the labels (y) based on the data (x) with a linear combination
(w) of a set of functions of the training data (x_i) and the test data (x).
Appropriate Data SetsBinary/M-Ary: Typically, SVMs are appropriate for binary classification problems - multi-class problems
require some extensions of SVMs, although in the PRT, SVMs can be used in
prtClassBinaryToMaryOneVsAll to emulate multi-class classification.
Data: SVM formulations often assume vector-valued training data, however as long as a suitable kernel-
function can be constructed, SVMs can be used on arbitrary data (e.g., string-match distances can be
usned as a kernel for calculating the distances between character strings). Note, however, that SVMs do
assume that the kernel used is a Mercer kernel, so some functions are not appropriate as SVM kernels -
http://en.wikipedia.org/wiki/Mercers_theorem.
Computational Considerations: Depending on the kernel, and particular algorithm under consideration,
training an SVM can be very time-consuming for very large data sets. Proper selection of SVM
parameters can significantly improve training time. At run-time, SVMs are typically very fast, with
computational complexity that grows approximately linearly with the size of the training data set.
SVM Parameters & NotesAs you might imagine, several SVM parameters will have significant effect on overall classification
performance. Good performance requires careful selection of each of these; though some generalrules-of-thumb can help provide reasonable performance with a minimum of headaches.
Parameter: Cost (Scalar)Internally, the SVM is going to try and ignore a whole bunch of your training data, by setting their
corresponding w_i to zero. This might sound counter-intuitive, but its very important, because it makes
for fast run-time, and also (it turns out) that setting a bunch of ws to zero is fundamental to why the SVM
performs so well in general (see any number of articles on V-C Theory for more information).
http://en.wikipedia.org/wiki/Mercer's_theoremhttp://en.wikipedia.org/wiki/Radial_basis_function -
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
3/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 3
n or una e y, s presen s a ema - ow muc s ou e ry an ma e ws zero vs. ow m uc
should it try and classify your data absolutely perfectly? More zero-ws might improve performance on
the training set, but reduce the performance of the SVM on an unseen testing set!
The Cost parameter in the SVM enables you to control this trade off. Higher cost leads to more non-
zero w vectors, and more correctly classified training points, while lower costs tend to generate w
vectors with lots of zeros, and slightly worse performance on training data (though performance on
testing data may be better).
We usually run a number of experiments for different cost values across a range of, say 0.01 to 100,
though if performance is plateauing it might make sense to extend this range. The following figures show
how the SVM decision boundaries change with varying costs in the PRT.
close all;
ds = prtDataGenUnimodal;
c = prtClassLibSvm;
count = 1;
forw = logspace(-2,2,4);
c.cost = w; c = c.train(ds);
subplot(2,2,count);
plot(c);
legend off;
title(sprintf('Cost: %.2f',c.cost));
count = count + 1;
end
-
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
4/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 4
Parameter: Relative Class Error WeightsIn typical discussions of cost, errors in both classes are treated equally e.g., its equally bad to call a
-1 a 1 and vice-versa. In realistic operations, that may not be the case for example, failing to detect
a landmine, is significantly worse than calling a coke-can a landmine.
Luckily, SVMs enable us to specify class-specific error costs, so if class 1 has error cost of 1, and class
-1 has an error cost of 100, its 100x as bad to mistake a -1 for a 1 as the opposite.
LibSVM implements these class-specific weights using parameters called w-1, w1, etc. In the PRT,
these are implemented as a vector, weights. The following example shows how the effects of changing
the error weight on class 1 affects the overall SVM contours. Clearly, as the cost on class 1 increases,
the SVM spends more effort to correctly classify red elements.
close all;
c = prtClassLibSvm;count = 1;
forw = logspace(-1,1,4);
c.weight = [1 w]; %Class0: 1, Class1: w
c = c.train(ds);
subplot(2,2,count);
c.plot();
legend off;
title(sprintf(Weight: [%.2f,%.2f],c.weight(1),c.weight(2)));
count = count + 1;
-
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
5/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 5
Parameter: Kernel Choice & Associated ParametersThe proper choice of kernel makes a huge difference in the resulting performance of your classifier. We
tend to stick with RBF and linear kernels (kernelType = 0 or 2 in prtClassLibSvm), but several otheroptions (including hand-made kernels) are also possible. The linear kernel doesnt have any
parameters to set, but the RBF has a parameter that can significantly impact performance. In most
formulations, the parameter is referred to as sigma, but in LibSVM, the parameter is gamma, and its
equivalent to 1/sigma. For the RBF, you can set it to any positive value. You can also use the special
character k, and specify a coefficient as a string. k will evaluate to the number of features in the data
set e.g., 5k evaluates to 10 for a 2-dimensional data set.
In general, we find that for normalized data (see below), the default gamma value of k (the number of
dimensions) works well.
The following example code generates 4 example images for SVM decision boundaries for varying
gamma parameters.
close all;
c = prtClassLibSvm;
count = 1;
d = prtDataGenUnimodal;
forkk = logspace(-1,.5,4);
c.gamma = sprintf(%.2fk,kk);
c = c.train(d);
-
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
6/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 6
subplot(2,2,count);
c.plot();
title(sprintf(\gamma = %s,c.gamma));
legend off;
count = count + 1;
end
SVM Pre-PrccessingNote that for many kernel choices (e.g., RBF, and many others, see
http://en.wikipedia.org/wiki/Kernel_methods#Popular_kernels), the kernel output (P(x_i,x_j) depends
strongly and non-linearly on the magnitudes of the data vectors. E.g., exp(-1000) is not equal to
1000*exp(-1). In fact, if you refer to the RBF equation above, youll notice that if two elements of your
vector have a difference approaching 1000, P(x1,x2) will be dominated by a term like exp(-1000), which
by any reasonable metric (and certainly in floating point precision) is exactly 0. This is a bad thing .
In general, non-linear kernel functions should only be applied to data that is guaranteed to be in a
reasonable range (e.g., -10 to 10), or data that has been pre-processed to remove outliers or control
for data magnitude. The PRT pamkes several such techniques available compare and contrast the
performance in the following example:
close all;
ds = prtDataGenBimodal;
ds.X = 100*ds.X; %scale the data
http://en.wikipedia.org/wiki/Kernel_methods#Popular_kernels -
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
7/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 7
yOutNaive = kfolds(prtClassLibSvm,ds,3);
yOutNorm = kfolds(prtPreProcZmuv + prtClassLibSvm,ds,3);
[pfNaive,pdNaive] = prtScoreRoc(yOutNaive);
[pfNorm,pdNorm] = prtScoreRoc(yOutNorm);
h = plot(pfNaive,pdNaive,pfNorm,pdNorm);
set(h,'linewidth',3);
legend(h,{'Naive','Pre-Proc'});
title('ROC Curves for Naive and Pre-Processed Application of SVM to Bimodal Data');
Clearly, performance on un-normalized data is attrocious, but simple re-scaling acheives good results.
Optimizing ParametersThe general procedure in developing an SVM is to optimize both the C and gamma parameters for your
particular data set. You can do this using two for-loops and the PRT:
close all;
gammaVec = logspace(-2,1,10);
costVec = logspace(-2,1,10);
ds = prtDataGenUnimodal;
auc = nan(length(gammaVec),length(costVec));
kfoldsInds = ds.getKFoldKeys(3);
forgammaInd = 1:length(gammaVec);
-
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
8/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 8
forcostInd = 1:length(costVec);
c = prtClassLibSvm;
c.cost = costVec(costInd);
c.gamma = gammaVec(gammaInd);
yOut = crossValidate(c,ds,kfoldsInds);
auc(gammaInd,costInd) = prtScoreAuc(yOut);
imagesc(auc,[.95 1]);
colorbar drawnow;
end
end
title('AUC vs. Gamma Index (Vertical) and Cost Index (Horizontal)');
Some Rules-Of-ThumbIn general, you may not have time or simply want to optimize over your SVM parameters. In this case,
you can usually get by using ZMUV pre-processing, and the default SVM parameters (RBF kernel, Cost
= 1, gamma = k)
algo = prtPreProcZmuv + prtClassLibSvm;
-
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
9/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 9
Observation Info Supervised Learning: An Introduction for Scientists and Engineers
We hope this entry helps you make sense of how to use an SVM in real-world scenarios, and how to
optimize the SVM parameters for your particular data set. As always, proper cross-validation is
fundamental to good generalizability.
Happy coding.
Posted by Pete Jul 24th, 2013
Comments9 Comments
Sunil Dadhich
Why do we need to optimize the C and g values in SVM?
Peter Torrione Hi Sunil, The parameters in the SVM control the relative tradeoffs between sparsity and
accuracy on the training data set - even though the default parameters may work well, they
are not guaranteed to work ideally on all data sets. As a result, optimizing the parameters is
recommended. Not sure if that answers your question...
Mauro Baldi
Hello and many thanks again for the previous help!
This time I tried to build and compare different classifiers with the fantastic PRT youdeveloped, and the last classifier is a SVM.
I thoroughly read this guide and I tried, at first, to skip the "manual" pre-processing phase.
Instead, I used the ZMUV pre-processing which, as stated in the guide, avoid to optimize the
SVM parameters manually.
Nevertheless, the resulting ROC curves are not as satisfactory as those coming from the
other classifiers.
What I am wondering is wether this is normal (as I skipped a more detailed preprocessing)
or wether there might be something wrong with my code.
My code is:
http://disqus.com/mauro_baldi/http://disqus.com/petertorrione/http://disqus.com/disqus_OejKaeqy7Q/http://disqus.com/mauro_baldi/http://disqus.com/petertorrione/http://disqus.com/disqus_OejKaeqy7Q/http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1515212062http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1371942340http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1367765696http://newfolder.github.io/blog/2013/07/29/supervised-learning/http://newfolder.github.io/blog/2013/07/15/observation-info/ -
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
10/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 10
%% CLASSIFICATORE (PREPROCZMUV + SVM)%%
algoSVM = prtPreProcZmuv + prtClassLibSvm;
algoSVM = algoSVM.train(TrainingSet);
%% TEST %%
yOutTest = algoSVM.run(TestSet);
kennethmorton Mod
I don't see anything immediately wrong with your code. The default options for
LibSVM uses an RBF kernel. If your data is high dimensional you may need to use
something to reduce the dimensionality first. Have any other kernel classifiers
worked?
Mauro Baldi
Hello Kenny and thank you for your reply.
My data set is not very big. It consists of 1393 rows, 3 columns (the features)
and the corresponding target values (either 0 or 1).
So far I used the RBF kernel as default. I am trying to change the kernel type.
In particular, I read in the help that the kernel attribute is kernelType.
But if I type
algoSVM.kernelType = 0;
to set a linear kernel the following error code appers:
No public field kernelType exists for class prtAlgorithm.
So this means that the kernelType attribute is a private one and might be
changed through a set method.
How can I do that?
I also several questions about this procedure and I apologize in advance if themessage is too long.
kennethmorton Mod
Mauro,
When ou use the followin line:
http://disqus.com/kennethmorton/http://disqus.com/mauro_baldi/http://disqus.com/kennethmorton/http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1518727445http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1526514726http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1516248600http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1518727445http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1515212062http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1516248600 -
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
11/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 1
>> algoSVM = prtPreProcZmuv + prtClassLibSvm;
you are constructing a prtAlgorithm. This is why the properties of the
SVM canot be set directly using algoSVM. Referencing the individual
components of the algorithm can be done by accessing the actionCell
property of prtAlgorithm
>> algoSVM.actionCell{2}.kernelType = 0;
In general I don't like to do things this way. I find it is cleaner to
construct the algorithm with the properties you want using string value
pairs. For example
>> algoSVM = prtPreProcZmuv + prtClassLibSvm('kernelType',0);
1. I am confused b our code. Should there be two SVM al orithms.
Mauro Baldi
Hello Kenny and thank you very much for your, as always, fast and
very detailed replies.
My goal is this: I have a data set made up of a training set and a test
set.
What I would like to do is to build many classifiers (including SVMs)
and, at the end, pick up the best promising one.So far I have built RVM, KNN and SVM classifiers, all thanks to your
PRT toolbox and help.
So, I am really very grateful to you and Peter.
Although this post is just devoted to SVM, I have questions both on
SVMs but also on other issues I have encountered while trying to
implement your suggestions.
Therefore, I'd like to ask you wether I can contact you or Peter
privately.
Anyway, here are my questions:
1) You asked me " I am confused by your code. Should there be two
SVM algorithms. One with a linear kernel and one with an RBF
kennethmorton Mod
Mauro,
http://disqus.com/kennethmorton/http://disqus.com/mauro_baldi/http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1529793643http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1531792665http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1526514726http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1529793643 -
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
12/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
http://newfolder.github.io/blog/2013/07/24/using-svms/ 12
This is getting a bit detailed for the comments section. Let's talk this
offline. Please feel free to email me at [email protected]
Kenny
Mauro Baldi
Hello Kenny,
this time I am writing here because the questions I am gonna ask
might interest other people.
In a previous post you said that it is not a problem if you calibrate a
SVM with RBF kernel with or without any preprocessing.
Just to check, I tried the following calibrations:
1) Preprocessing with prtPreProcZmuv and automatic training (i.e.,
without the double loop on parameters Cost and gamma)
2) Manual calibration with prtPreProcPca preprocessing:
algoSVManual = prtPreProcPca + prtClassLibSvm;
3) Manual calibration without any preprocessing
algoSVManual = prtClassLibSvm;
4) Manual calibration with prtPreProcZmuv preprocessing:
Recent PostsDude Where's My Help?
verboseStorage and a little prtAlgorithm plotting
Introducing prtClassNNET
Supervised Learning: An Introduction for Scientists and Engineers
Using SVMs for Scientists and Engineers
http://newfolder.github.io/blog/2013/07/24/using-svms/http://newfolder.github.io/blog/2013/07/29/supervised-learning/http://newfolder.github.io/blog/2013/08/20/introducing-prtclassnnet/http://newfolder.github.io/blog/2013/09/04/verbosestorage-and-a-little-prtalgorithm-plotting/http://newfolder.github.io/blog/2013/10/09/dude-wheres-my-help/http://disqus.com/mauro_baldi/https://disqus.com/websites/?utm_source=newfolderconsulting&utm_medium=Disqus-Footerhttp://disqus.com/http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1531792665http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1580672771 -
7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog
13/13
9/23/2014 Using SVMs for Scientists and Engineers - PRT Blog
Copyright 2013 - Kenneth Morton and Peter Torrione - Powered byOctopress - Theme by Brian Armstrong
http://brianarmstrong.org/http://octopress.org/