machine learning using support vector machines (paper review) presented to: prof. dr. mohamed...
TRANSCRIPT
Machine Learning Using Support Vector Machines
(Paper Review)
Presented to: Prof. Dr. Mohamed Batouche
Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan
427220094 427220111
King Saud UniversityThe College of Computer & Information Science
Computer Science Department (Master)Neural Networks and Machine Learning Applications (CSC 563 )
Spring 2008
Paper Information Title: Machine Learning Using Support Vector
Machines.
Authors: Abdul Rahim Ahmad. Marzuki Khalid. Rubiyah Yusof.
Publisher: MSTC 2002, Johor Bahru.
Date: September 2002.
Review Outlines
Introduction Artificial Neural Network (ANN) Support Vector Machine (SVM)
Support Vectors Theory of SVM Quadratic Programming Non-linear SVM SVM Implementations SVM for Multi-class Classification
Handwriting Recognition Experimental Results ANN vs. SVM Conclusion
Introduction
Introduction
The aim of this paper is to Present SVM as a comparison to ANN. Show the concept of SVM by providing
them some details about SVM.
Machine Learning
Machine Learning (ML) Constructing computer program that
automatically improve its performance with experience.
???Learned
Data
Machine Learning (ML) Applications
1. Data mining programs.2. Information filtering systems.3. Autonomous vehicles.4. Pattern recognition system:
Speech recognition. Handwriting recognition. Face recognition. Text categorization.
Artificial NeuralNetwork
Artificial Neural Network (ANN)
Massively parallel computing systems consisting of an extremely large number of simple processors with many interconnections.
Artificial Neural Network (ANN)
The main characteristics of ANN are:1. Ability to learn complex nonlinear input-
output relationships2. They use sequential training procedures to
updating (adapt) network architecture and connection weights so that a network can work efficiently.
Artificial Neural Network (ANN) In the area of pattern classification,
the feed-forward network is most popularly used.
Pattern Classification
Data Clustering
Multilayer perceptron (MLP) Radial-BasisFunction (RBF) networks
Kohonen Self-Organizing Map (SOM)
ANN and Pattern Recognition
ANN is low dependence on domain-specific knowledge compared to rule-based approaches.
Availability of efficient learning algorithms to use.
Support VectorMachine
Support Vector Machine – (SVM) SVM was introduced in 1992 by
Vapnik and his coworkers. SVM original form:
Binary classifier; separates between two classes.
Design for linear and separable data set. SVM used for classification and
regression.
Support Vector Machine – (SVM)
Theory of SVM
H1
H
H2W
Constraints
1. No data points between H1 and H2.
2. Margin between H1 and H2 is maximized.
-
H1
H
H2W
Support VectorsSolution: expressed as
a linear combination of support vectors:
• Subset of training patterns• Close to the decision boundary
Theory of SVM training data:{ , , ……, } Where:
),( 11 yx ),( 22 yx ),( nn yx
SVM
Class or label
Input features
di Rx }1,1{ iy
iy
Theory of SVM
H1
H
H2W
Class 1:
1. bxw
Class 2:
1. bxw
1).( bxwy ii
Theory of SVM
).sgn()( bxwxf
Learn a linear separating hyper plane classifier:
Quadratic Programming to maximize the margin , we need
to minimize:
Quadratic Programming solved by introducing Lagrange multipliers
||||
2
w2||||
2
1w
),,( bwL 2||||2
1w
N
i iii bwxy1
).(
N
i i1
Lagrange Multipliers
Maximize L where w and b are eliminated:
and
ji jijiji xxyy
,2
1
N
i i1DL
),,( bwL 2||||2
1w
N
i iii bwxy1
).(
N
i i1
Theory of SVM
Discriminate function:
)(xf ).sgn((1
bxxyN
i iii
Non-linear SVM
1. SVM mapped the data sets of input space into a higher dimensional feature space
2. Linear and the large-margin learning algorithm is then applied.
Non-linear SVM
Input (data) space(Non- linear)
)(xx
Feature space(Linear)
Non-linear SVM If the mapping function is ,we just
solve:
However, the mapping can be implicitly done by kernel functions:
)(.).(2
1, jji ijiji xxyy
N
i i1DL
ji jijiji xxkyy
,),(
2
1
N
i i1DL
Non-linear SVM
Discriminate function:
)(xf )),(sgn((1
bxxkyN
i iii
Kernel
There are many kernels that can be used that way.
Any kernel that satisfies Mercer’s condition can be used.
Kernel - Examples Polynomial kernels
Hyperbolic tangent
Radial basis function (Gaussian kernel)
Non-separable Input Space In real world problem, there is always
noise. Noise Non-separable data. Slack variables are introduced to each
input:
Penalty Parameter C: control overfitting.
iii bxwy 1).(
Non-separable Input Space
H1
H
H2
1
.
bxw
0
.
bxw
1
.
bxw
j
j
i
i
SVM for Multi-class Classification
Basic SVM is binary classifier; separates between two classes.
In real world, more than two classes is usually needed.
Ex: handwriting recognition
SVM for Multi-class Classification
Methods
Modifying binary to incorporate multi-class learning.
Combining binary
classifiers
One vs. One
K (K-1) /2
One vs. All
K
SVM for Multi-class Classification One vs. One and DAGSVM (Directed
Acyclic Graph) are the best choices for practical use.
they are: Less complex Easy to construct Faster to train.
Tapia et al, 2005
SVM Implementation Quadratic programming (QP) which is
computationally intensive.
However, many decomposition methods have been proposed that avoids the QP and makes SVM learning practical for many current problems.
Ex: Sequential Minimal Optimization (SMO)
Results of Experimental Studies
Data Handwritten digit database:
MNIST dataset.
USPS dataset. more difficult; human recognition error rate
is as high as 2.5%.
Error rate comparison of ANN, SVM and other algorithms for MNIST and USPS database.
1. SVM error rate is significantly lower than most other algorithms except for LeNet 5 NN.
2. Training time for SVM was significantly slower the higher recognition rate (low error rate) justify for the usage.
3. SVM usage should be increasing and replacing ANN in the area of handwriting recognition where faster method of implementing SVM have been introduced recently.
Results of Experimental Studies
SVM vs. ANNSVM ANN
Naturally handles multi-class classification.
Multi-class implementation needs to be performed.
ANN is known to overfit data unless cross-validation is applied.
SVM does not overfit data (Structural Risk Minimization).
Local minimum Global minimum.
Conclusion SVM is Powerful and is a useful alternative to
neural networks.
SVM find Global, unique solution.
Two key concepts of SVM: maximize the margin and choice of kernel.
Performance depends on choice of kernel and
parameters Still a research topic.
Training is memory-intensive due to QP.
Conclusion
Many active research is taking place on areas related to SVM.
Many SVM implementations are available on the Web:
SVMLight LIBSVM
Thank you…..
Questions?