data mining via support vector machines olvi l. mangasarian university of wisconsin - madison ifip...
TRANSCRIPT
![Page 1: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/1.jpg)
Data Miningvia Support Vector Machines
Olvi L. Mangasarian
University of Wisconsin - Madison
IFIP TC7 Conference on
System Modeling and Optimization
Trier July 23-27, 2001
![Page 2: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/2.jpg)
What is a Support Vector Machine?
An optimally defined surfaceTypically nonlinear in the input spaceLinear in a higher dimensional spaceImplicitly defined by a kernel function
![Page 3: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/3.jpg)
What are Support Vector Machines Used For?
ClassificationRegression & Data FittingSupervised & Unsupervised Learning
(Will concentrate on classification)
![Page 4: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/4.jpg)
Example of Nonlinear Classifier:Checkerboard Classifier
![Page 5: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/5.jpg)
Outline of Talk
Generalized support vector machines (SVMs) Completely general kernel allows complex classification
(No positive definiteness “Mercer” condition!) Smooth support vector machines
Smooth & solve SVM by a fast global Newton method Reduced support vector machines
Handle large datasets with nonlinear rectangular kernels Nonlinear classifier depends on 1% to 10% of data points
Proximal support vector machines Proximal planes replace halfspaces Solve linear equations instead of QP or LP Extremely fast & simple
![Page 6: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/6.jpg)
Generalized Support Vector Machines2-Category Linearly Separable Case
A+
A-
wx0w = í + 1
x0w = í à 1
![Page 7: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/7.jpg)
Generalized Support Vector MachinesAlgebra of 2-Category Linearly Separable Case
Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by:A i
An m-by-m diagonal matrix D with +1 & -1 entries
D(Awà eí )=e;
More succinctly:
where e is a vector of ones.
x0w = í æ1: Separate by two bounding planes,
A iw=í + 1; for D i i = + 1;A iw5í à 1; for D i i = à 1:
![Page 8: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/8.jpg)
Generalized Support Vector MachinesMaximizing the Margin between Bounding Planes
wx0w = í + 1
x0w = í à 1
A+
A-
jjwjj2
![Page 9: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/9.jpg)
Generalized Support Vector MachinesThe Linear Support Vector Machine Formulation
s.t. D(Awà eí ) + y = e
Solve the following mathematical program for some :
w;í ;ymin ÷e0y+ 2
kwk
y = 0:
÷> 0
The nonnegative slack variable is zero iff: Convex hulls of and do not intersect is sufficiently large
yA + A à
÷
D(Awà eí )=e
![Page 10: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/10.jpg)
Breast Cancer Diagnosis Application97% Tenfold Cross Validation Correctness780 Samples:494 Benign, 286 Malignant
![Page 11: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/11.jpg)
Another Application: Disputed Federalist PapersBosch & Smith 1998
56 Hamilton, 50 Madison, 12 Disputed
![Page 12: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/12.jpg)
SVM as an Unconstrained Minimization Problem
At the solution of (QP) : where (á)+ = maxf á;0g
y = (eà D(Awà eí ))+ ,
Hence (QP) is equivalent to the nonsmooth SVM:
minw;í 2
÷k(eà D(Awà eí ))+k22 + 2
1kw; í k22
2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y > ey > 0;w;í
min
s. t.(QP)
Changing to 2-norm and measuring margin in ( ) space:w;í
![Page 13: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/13.jpg)
Smoothing the Plus Function: Integrate the Sigmoid Function
![Page 14: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/14.jpg)
SSVM: The Smooth Support Vector Machine Smoothing the Plus Function
Integrating the sigmoid approximation to the step function:
s(x;ë) = 1+"à ëx1 ;
gives a smooth, excellent approximation to the plus function:
p(x;ë) = x + ë1 log(1+ "à ëx); ë > 0:
Replacing the plus function in the nonsmooth SVMby the smooth approximation gives our SSVM:
min Ðë(w;í ) :=
min2÷k p(eà D(Awà eí );ë) k2
2 + 21 k w;í k2
2
![Page 15: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/15.jpg)
Newton: Minimize a sequence of quadratic approximationsto the strongly convex objective function, i.e. solve a sequenceof linear equations in n+1 variables. (Small dimensional inputspace.)
Armijo: Shorten distance between successive iterates so as to generate sufficient decrease in objective function. (In computational reality, not needed!)
Global Quadratic Convergence: Starting from any point,the iterates guaranteed to converge to the unique solution at a quadratic rate, i.e. errors get squared. (Typically, 6 to 8 iterations without an Armijo.)
![Page 16: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/16.jpg)
Nonlinear SSVM Formulation(Prior to Smoothing)
By QP “duality”, w = A0Du. Maximizing the margin
in the “dual space” , gives:
2÷k(eà D(AA0Du à eí ))+k2
2 + 21ku;í k2
2u;ímin
K (A;A0) Replace AA0by a nonlinear kernel :
2÷k(eà D(K (A;A0)Du à eí ))+k2
2 + 21ku;í k2
2u;ímin
Linear SSVM: (Linear separating surface:x0w = í )
w;í :y = 0 (QP)2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y
min
s. t. = e
![Page 17: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/17.jpg)
The Nonlinear Classifier
Gaussian (Radial Basis) Kernel :
"à ökA ià A jk22; i; j = 1;. . .;m
Polynomial Kernel : (AA0+ öaa0)dï
K (A;B) : Rmâ n â Rnâ l 7à! Rmâ l
K (x0;A0)Du = í
The nonlinear classifier :
Where K is a nonlinear kernel, e.g.:
![Page 18: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/18.jpg)
![Page 19: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/19.jpg)
Checkerboard Polynomial Kernel ClassifierBest Previous Result: [Kaufman 1998]
![Page 20: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/20.jpg)
![Page 21: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/21.jpg)
Difficulties with Nonlinear SVM for Large Problems
The nonlinear kernel K (A;A0) 2 Rmâ m is fully dense Long CPU time to compute
m2
numbers
Runs out of memory even before solving theoptimization problem
Computational complexity depends on m
Nonlinear separator depends on almost entire dataset Have to store the entire dataset after solve the problem
Complexity of nonlinear SSVM ø O((m+ 1)3)
Large memory to store an mâ m kernel matrix
Need to solve a huge unconstrained or constrained optimization problem with m2 entries
![Page 22: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/22.jpg)
Reduced Support Vector Machines (RSVM)
Large Nonlinear Kernel Classification Problems
is a small random sample ofK (A;Aö0);where Aö0 A0 Key idea: Use a rectangular kernel.
Typically Aö has 1% to 10% of the rows of A
Two important consequences:RSVM can solve very large problems
Aö Nonlinear separator depends on only
uö;í ;ymin
2÷y0y+ 2
1(uö0uö+ í 2)
s:t: D(K (A;Aö0)Döuöà eí ) + y=e;y=0
gives lousy resultsK (Aö;Aö0) Separating surface: K (x0;Aö0)Döuö = í
![Page 23: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/23.jpg)
Checkerboard 50-by-50 Square Kernel Using 50 Random Points Out of 1000
![Page 24: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/24.jpg)
RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000
![Page 25: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/25.jpg)
RSVM on Large UCI Adult DatasetStandard Deviation over 50 Runs = 0.001
Average Correctness % & Standard Deviation, 50 Runs
(6414, 26148) 84.47 0.001 77.03 0.014 210 3.2%(11221, 21341) 84.71 0.001 75.96 0.016 225 2.0%(16101, 16461) 84.90 0.001 75.45 0.017 242 1.5%(22697, 9865) 85.31 0.001 76.73 0.018 284 1.2%(32562, 16282) 85.07 0.001 76.95 0.013 326 1.0%
Dataset Size( Train ; Test)
UCI AdultK (A;A0)mâ m
Testing%Std.Dev.
Amâ 123
m m=mK (A;A0)mâ m
%Testing Std.Dev.
![Page 26: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/26.jpg)
CPU Times on UCI Adult DatasetRSVM, SMO and PCGC with a Gaussian Kernel
Adult Dataset : CPU Seconds for Various Dataset Sizes
Size 3185 4781 6414 11221 16101 22697
32562
RSVM 44.2 83.6 123.4 227.8 342.5 587.4 980.2
SMO (Platt)
66.2 146.6 258.8 781.4 1784.4
4126.4
7749.6
PCGC
(Burges)
380.5
1137.2
2530.6
11910.6
Ran out of memory
![Page 27: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/27.jpg)
Tim
e( C
PU
sec
. )
Training Set Size
CPU Time Comparison on UCI DatasetRSVM, SMO and PCGC with a Gaussian Kernel
![Page 28: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/28.jpg)
PSVM: Proximal Support Vector Machines
Fast new support vector machine classifier
Proximal planes replace halfspaces
Order(s) of magnitude faster than standard classifiers
Extremely simple to implement
4 lines of MATLAB code
NO optimization packages (LP,QP) needed
![Page 29: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/29.jpg)
Proximal Support Vector Machine:Use 2 Proximal Planes Instead of 2 Halfspaces
A+
A-
x0w = í + 1
x0w = í à 1 jjíwjj22
w
![Page 30: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/30.jpg)
PSVM Formulation
We have the SSVM formulation:
w;í ;y > 0 (QP)2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y
min
s. t. = e
This simple, but critical modification, changes the nature of the optimization problem significantly!
Solving for in terms of and gives:
minw;í 2
÷keà D(Awà eí )k22 + 2
1kw; í k22
y w í=
PSVM
![Page 31: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/31.jpg)
Advantages of New Formulation
Objective function remains strongly convex
An explicit exact solution can be written in terms of the problem data
PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space
Exact leave-one-out-correctness can be obtained in terms of problem data
![Page 32: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/32.jpg)
Linear PSVM
We want to solve:
w;ímin
2÷keà D(Awà eí )k2
2 + 21kw; í k2
2
Setting the gradient equal to zero, gives a nonsingular system of linear equations.
Solution of the system gives the desired PSVM classifier
![Page 33: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/33.jpg)
Linear PSVM Solution
H = [A à e]Here,
íw
h i= (÷
I + H 0H)à 1H 0De
The linear system to solve depends on:
H 0H(n + 1) â (n + 1)which is of the size
is usually much smaller than n m
![Page 34: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/34.jpg)
Linear Proximal SVM Algorithm
Classifier: sign(w0x à í )
Input A;D
Define H = [A à e]
Solve (÷I + H 0H) í
wh i
= v
v = H0DeCalculate
![Page 35: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/35.jpg)
Nonlinear PSVM Formulation
By QP “duality”, w = A0Du. Maximizing the margin
in the “dual space” , gives:
2÷keà D(AA0Du à eí )k2
2+ 21ku;í k2
2u;í
min
K (A;A0) Replace AA0by a nonlinear kernel :
2÷keà D(K (A;A0)Du à eí )k2
2+ 21ku;í k2
2u;ímin
Linear PSVM: (Linear separating surface:x0w = í )
w;í (QP)2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y
min
s. t. = e
![Page 36: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/36.jpg)
Nonlinear PSVM
H = [K (A;A0) à e]Define slightly different:
íu
h i= (÷
I + H 0H)à 1H 0De
Similar to the linear case, setting the gradient equal to zero, we obtain:
However, reduced kernel technique (RSVM) can be used to reduce dimensionality.
Here, the linear system to solve is of the size
(m+ 1) â (m+ 1)
![Page 37: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/37.jpg)
Linear Proximal SVM Algorithm
Input A;D
Solve (÷I + H 0H) í
wh i
= v
v = H0DeCalculate
Non
Define H = [A à e] K = K (A;A0)K
Classifier: sign(w0x à í ) Classifier: sign(K (x0;A0)u à í )
u u = Du
![Page 38: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/38.jpg)
PSVM MATLAB Code
function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification% INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = pvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r
![Page 39: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/39.jpg)
Linear PSVM Comparisons with Other SVMs
Much Faster, Comparable Correctness
Data Setm x n
PSVMTen-fold test
%Time (sec.)
SSVM Ten-fold test
%Time (sec.)
SVM Ten-fold test
%Time (sec.)
WPBC (60 mo.)110 x 32
68.50.02
68.50.17
62.73.85
Ionosphere351 x 34
87.30.17
88.71.23
88.02.19
Cleveland Heart297 x 13
85.90.01
86.20.70
86.51.44
Pima Indians768 x 8
77.50.02
77.60.78
76.437.00
BUPA Liver345 x 6
69.40.02
70.00.78
69.56.65
Galaxy Dim4192 x 14
93.50.34
95.05.21
94.128.33
light
![Page 40: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/40.jpg)
Gaussian Kernel PSVM Classifier Spiral Dataset: 94 Red Dots & 94 White Dots
![Page 41: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/41.jpg)
Conclusion
Mathematical Programming plays an essential role in SVMs
TheoryNew formulations
Generalized & proximal SVMsNew algorithm-enhancement concepts
Smoothing (SSVM)
Data reduction (RSVM)Algorithms
Fast : SSVM, PSVMMassive: RSVM
![Page 42: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/42.jpg)
Future Research
TheoryConcave minimization
Concurrent feature & data reduction Multiple-instance learning
SVMs as complementarity problems
Algorithms
Multicategory classification algorithms
Incremental algorithms
Kernel methods in nonlinear programming
Chunking for massive classification: 108
![Page 43: Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfdf1a28abf838cb2b7c/html5/thumbnails/43.jpg)
Talk & Papers Available on Web
www.cs.wisc.edu/~olvi