svm — support vector machines a new classification method for both linear and nonlinear data it...
TRANSCRIPT
![Page 1: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/1.jpg)
SVM—Support Vector Machines
• A new classification method for both linear and nonlinear data
• It uses a nonlinear mapping to transform the original training data into a higher dimension
• With the new dimension, it searches for the linear optimal separating hyperplane (i.e., “decision boundary”)
• With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane
• SVM finds this hyperplane using support vectors (“essential” training tuples) and margins (defined by the support vectors)
![Page 2: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/2.jpg)
SVM—History and Applications• Vapnik and colleagues (1992)—groundwork from Vapnik &
Chervonenkis’ statistical learning theory in 1960s
• Features: training can be slow but accuracy is high owing to their
ability to model complex nonlinear decision boundaries (margin
maximization)
• Used both for classification and prediction
• Applications:
– handwritten digit recognition, object recognition, speaker
identification, benchmarking time-series prediction tests
![Page 3: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/3.jpg)
![Page 4: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/4.jpg)
![Page 5: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/5.jpg)
![Page 6: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/6.jpg)
![Page 7: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/7.jpg)
![Page 8: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/8.jpg)
![Page 9: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/9.jpg)
![Page 10: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/10.jpg)
![Page 11: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/11.jpg)
SVM—Linearly Separable• A separating hyperplane can be written as
W ● X + b = 0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
• For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
• The hyperplane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
• Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
• This becomes a constrained (convex) quadratic optimization problem: Quadratic objective function and linear constraints Quadratic Programming (QP) Lagrangian multipliers
![Page 12: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/12.jpg)
Support vectors
• This means the hyperplane
can be written as22110 awawwx
aa )( vectorsupp. is
iybxi
ii
• The support vectors define the maximum margin hyperplane!– All other instances can be deleted without changing its position and
orientation
![Page 13: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/13.jpg)
Finding support vectors
• Support vector: training instance for which i > 0
• Determine i and b ?—A constrained quadratic optimization problem– Off-the-shelf tools for solving these problems– However, special-purpose algorithms are faster– Example: Platt’s sequential minimal optimization algorithm
(implemented in WEKA)
• Note: all this assumes separable data!
aa )( vectorsupp. is
iybxi
ii
![Page 14: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/14.jpg)
![Page 15: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/15.jpg)
Extending linear classification
• Linear classifiers can’t model nonlinear class boundaries
• Simple trick:– Map attributes into new space consisting of
combinations of attribute values– E.g.: all products of n factors that can be constructed
from the attributes
• Example with two attributes and n = 3:
323
22132
212
311 awaawaawawx
![Page 16: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/16.jpg)
![Page 17: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/17.jpg)
![Page 18: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/18.jpg)
Nonlinear SVMs
• “Pseudo attributes” represent attribute combinations
• Overfitting not a problem because the maximum margin hyperplane is stable– There are usually few support vectors relative to the
size of the training set
• Computation time still an issue– Each time the dot product is computed, all the
“pseudo attributes” must be included
![Page 19: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/19.jpg)
A mathematical trick
• Avoid computing the “pseudo attributes”!• Compute the dot product before doing the
nonlinear mapping • Example: for
compute
• Corresponds to a map into the instance space spanned by all products of n attributes
aa )( vectorsupp. is
iybxi
ii
n
iii iybx ))((
vectorsupp. is
aa
![Page 20: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/20.jpg)
Other kernel functions
• Mapping is called a “kernel function”• Polynomial kernel
• We can use others:
• Only requirement:• Examples:
))(( vectorsupp. is
aa iKybxi
ii
)()(),( jijiK xxxx
2
2
2),( ji
eK ji
xx
xx
djijiK )1(),( xxxx
)tanh(),( bK jiji xxxx
n
iii iybx ))((
vectorsupp. is
aa
![Page 21: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/21.jpg)
Problems with this approach
• 1st problem: speed– 10 attributes, and n = 5 >2000 coefficients
– Use linear regression with attribute selection– Run time is cubic in number of attributes
• 2nd problem: overfitting– Number of coefficients is large relative to the number
of training instances– Curse of dimensionality kicks in
![Page 22: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/22.jpg)
Sparse data
• SVM algorithms speed up dramatically if the data is sparse (i.e. many values are 0)
• Why? Because they compute lots and lots of dot products
• Sparse data compute dot products very efficiently
– Iterate only over non-zero values
• SVMs can process sparse datasets with 10,000s of attributes
![Page 23: SVM — Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training](https://reader035.vdocument.in/reader035/viewer/2022062712/56649cae5503460f9497227b/html5/thumbnails/23.jpg)
Applications
• Machine vision: e.g face identification– Outperforms alternative approaches (1.5% error)
• Handwritten digit recognition: USPS data– Comparable to best alternative (0.8% error)
• Bioinformatics: e.g. prediction of protein secondary structure
• Text classifiation• Can modify SVM technique for numeric
prediction problems