difficulties with nonlinear svm for large problems the nonlinear kernel is fully dense ...
Post on 20-Dec-2015
216 views
TRANSCRIPT
![Page 1: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/1.jpg)
Difficulties with Nonlinear SVM
for Large Problems
The nonlinear kernelK (A;A0) 2 Rmâ m is fully dense
Computational complexity depends on m
Separating surface depends on almost entire dataset
Complexity of nonlinear SSVM ø O((m+ 1)3)
Runs out of memory while storing the kernel matrix
Long CPU time to compute the dense kernel matrix
O(m2) Need to generate and store entries
Need to store the entire dataset even after solving the problem
![Page 2: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/2.jpg)
Reduced Support Vector Machine
K (x0;Aö0)Döuö = í
(ii) Solve the following problem by the Newtonmethod with correspondingD ú D :
2÷kp(eà D(K (A;A0)Döuö à eí );ë)k2
2 + 21kuö; í k2
2min(u; í ) 2 Rm+1
K (x0;Aö0)Döuö = í
(iii) The nonlinear classifier is defined by the optimal(u;í )solution in step (ii):
Using K (A;A0) gives lousy results!
(i) Choose a random subset matrix ofentire data matrixA 2 Rmâ n; (m << m):
A 2 Rmâ n
Nonlinear Classifier:
![Page 3: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/3.jpg)
A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points
in Separate 486 Asterisks from 514
DotsR2
![Page 4: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/4.jpg)
Conventional SVM Result on Checkerboard
Using 50 Randomly Selected Points Out of 1000
K (A;A0) 2 R50â 50
![Page 5: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/5.jpg)
RSVM Result on Checkerboard Using SAME 50 Random Points Out of
1000
K (A;A0) 2 R1000â 50
![Page 6: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/6.jpg)
RSVM on Moderate Sized Problems(Best Test Set Correctness %, CPU
seconds)
Cleveland Heart297 x 13, 30
86.473.04
85.9232.42
76.881.58
BUPA Liver345 x 6 , 35
74.862.68
73.6232.61
68.952.04
Ionosphere 351 x 34, 35
95.195.02
94.3559.88
88.702.13
Pima Indians768 x 8, 50
78.645.72
76.59328.3
57.324.64
Tic-Tac-Toe958 x 9, 96
98.7514.56
98.431033.5
88.248.87
Mushroom8124 x 22, 215
89.04466.20
N/A
N/A
83.90221.50
K (A;A0)mâ m K (A;A0)mâ m K (A;A0)mâ mmâ n; mDataset Size
![Page 7: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/7.jpg)
RSVM on Large UCI Adult Dataset
Standard Deviation over 50 Runs = 0.001
Average Correctness % & Standard Deviation, 50 Runs
(6414, 26148) 84.47 0.001 77.03 0.014 210 3.2%(11221, 21341) 84.71 0.001 75.96 0.016 225 2.0%(16101, 16461) 84.90 0.001 75.45 0.017 242 1.5%(22697, 9865) 85.31 0.001 76.73 0.018 284 1.2%(32562, 16282) 85.07 0.001 76.95 0.013 326 1.0%
Dataset Size( Train ; Test)
UCI AdultK (A;A0)mâ m
Testing%Std.Dev.
Amâ 123
m m=mK (A;A0)mâ m
%Testing Std.Dev.
![Page 8: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/8.jpg)
Tim
e( C
PU
sec
. )
Training Set Size
RSVMSMOPCGC
![Page 9: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/9.jpg)
Support Vector Regression(Linear Case:f (x) = x0w+ b)
Given the training set:S = f (xi;yi)j xi 2 Rn; yi 2 R; i = 1; . . .; lg
Find a linear function,f (x) = x0w+ b where(w;b)is determined by solving a minimization
problem that guarantees the smallest overallexperiment error made by f (x) = x0w+ b
Motivated by SVM: jjwjj2should be as small as possible
Some tiny error should be discard
![Page 10: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/10.jpg)
-Insensitive Loss Functionï -insensitive loss function:ï
jyi à f (xi)jï = maxf0; jyi à f (xi)j à ïg
The loss made by the estimation function, fat the data point(xi;yi) is
jøjï = maxf0; jøj à ïg= 0 if jøj6 ïjøj à ï otherwise
ú
If ø2 Rn then jøjï 2 Rn is defined as:
(jøjï)i = jøijï ; i = 1. . .n
![Page 11: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/11.jpg)
x
x
x
x
x
x
x
x
x
"
"
-Insensitive Linear Regression"
f (x) = x0w+ b
yj à f (xj) à "f (xk) à yk à "
Find (w;b)with the smallest overall error
![Page 12: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/12.jpg)
ï- insensitive Support Vector Regression Model
Motivated by SVM: jjwjj2should be as small as possible
Some tiny error should be discarded
min(w;b;ø)2Rn+1+m
21jjwjj22+ Ce0 øj jï
where øj jï 2 Rm; ( øj jï)i = max(0; A iw+ bà yij j à ï )
![Page 13: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/13.jpg)
Reformulated - SVR as a Constrained Minimization Problem
min(w;b;ø;øã)2Rn+1+2m
21w0w+ Ce0(ø+ øã)
yà Awà eb 6 eï + øAw+ ebà y 6 eï + øã
ø;øã > 0
subject to
n+1+2m variables and 2m constrains minimization problem
ï
Enlarge the problem size and computational complexity for solving the problem
![Page 14: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/14.jpg)
SV Regression by Minimizing Quadratic -Insensitive Lossï
We minimize jj(w;b)jj22at the same time Occam’s razor: the simplest is the best
min(w;b;ø)2R n+1+l
21(jjwjj22 + b2) + 2
C jj(jøj")jj22
We have the following (nonsmooth) problem:
where (jøj") i = jyi à (w0xi + b)j"
Have the strong convexity of the problem
![Page 15: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/15.jpg)
- insensitive Loss Functionï
(à x à ï )+ (x à ï )+xj jï
![Page 16: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/16.jpg)
Quadratic -insensitive Loss Function ï
xj j2ï = ((x à ï )+ + (à x à ï )+)2
= (x à ï )2+ + (à x à ï )2
+
(x à ï )+ á(à x à ï )+ = 0
![Page 17: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/17.jpg)
p2ï -function replace Us
e ïQuadratic -insensitive Function
p(x; ì )
p2ï (x; ì ) = (p(x à ï ; ì ))2 + (p(à x à ï ; ì ))2
which
is defined byp(x; ì ) = x + ì
1 log(1+ expà ì x)
p-function withë = 10; p(x;10); x 2 [à 3;3]
![Page 18: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/18.jpg)
xj j2ï p2ï (x; ì ); ï = 1; ì = 5
![Page 19: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/19.jpg)
-insensitive Smooth Support Vector Regression
ï
min(w;b)2Rn+1
21(w0w+ b2) + 2
C P
i=1
m
p2ï (A iw+ bà yi; ì )2
C P
i=1
m
A iw+ bà yij j2ï
This problem is a strongly convexstrongly convex minimization problem without any constrainsThe object function is twice twice differentiabledifferentiable thus we can use a fast Newton-Armijo methodNewton-Armijo method to solve this problem
min(w;b)2Rn+1
Ðï ;ë(w;b) :=
![Page 20: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/20.jpg)
Nonlinear -SVR ï
w = A0ë; ë 2 R m
Based on duality theorem and KKT–optimality conditions
y ù Aw+ eb
y ù AA0ë + eby ù K (A;A0)ë + ebIn nonlinear
case :
![Page 21: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/21.jpg)
Nonlinear SVR
min(ë;b)2R m+1
21jjëjj22 + C
P
i=1
m
K (A i;A0)ë + bà yij jï
A 2 R mâ n;B 2 R nâ l
K (A;B)
ï à
Letand R mâ n â R nâ l=) R mâ l
K (A i ;A 0) 2 R 1â mand
Nonlinear regression function : f (x) = K (x;A0)ë + b
![Page 22: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/22.jpg)
min(ë;b)2R m+1
21(ë0ë + b2)
+ 2C P
i=1
m
p2ï (K (A i;A0)ë + bà yi; ì )+ 2
C P
i=1
m
K (A i;A0)ë + bà yij j2ï
Nonlinear Smooth Support Vector
-insensitive Regressionï
![Page 23: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/23.jpg)
Training set and testing set (Slice methodSlice method) Gaussian kernelGaussian kernel is used to generate
nonlinear -SVR in all experiments Reduced kernel techniqueReduced kernel technique is utilized when
training dataset is bigger then 1000 Error measure : 2-norm relative error
Numerical Results
ï
yk k2
yà yêk k2 : observations: predicted values
yê
y
![Page 24: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/24.jpg)
f (x) = 0:5ãsinc(ù10x)+nois
e
Noise: mean=0
x 2 [à 1;1], 101 points
û = 0:04Parameter:
÷= 50; ö = 5; " = 0:02
Training time : 0.3 sec.
101 Data Points in R â RNonlinear SSVR with Kernel:expà öjjxià xj jj22
![Page 25: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/25.jpg)
First Artificial Dataset
f (x) = 0:5ãù30x
sinc( ù30x)
+ ú úrandom noise with mean=0,standard deviation 0.04
Training Time : 0.016 sec.Error : 0.059
Training Time : 0.015 sec.Error : 0.068
ï - SSVR LIBSVM
![Page 26: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/26.jpg)
Original Function
Noise : mean=0 ,
û = 0:4
Parameter :
÷= 50; ö = 1; " = 0:5
Training time : 9.61 sec.Mean Absolute Error (MAE) of 49x49 mesh points : 0.1761
Estimated Function
481 Data Points in
R2 â R
![Page 27: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/27.jpg)
Noise : mean=0 ,
û = 0:4
Estimated Function
Original Function
Using Reduced Kernel:
K (A;A0) 2 R28900â 300
Parameter :
Training time : 22.58 sec.MAE of 49x49 mesh points : 0.0513
C = 10000; ö = 1; ï = 0:2
![Page 28: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/28.jpg)
Real Datasets
![Page 29: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/29.jpg)
Linear -SSVRTenfold Numerical Result
ï
![Page 30: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/30.jpg)
Nonlinear -SSVRTenfold Numerical Result
1/2ï
![Page 31: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/31.jpg)
Nonlinear -SSVRTenfold Numerical Result 2/2
ï
![Page 32: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/32.jpg)
Difficulties with Nonlinear SVM
for Large Problems
The nonlinear kernelK (A;A0) 2 Rmâ m is fully dense
Computational complexity depends on m
Separating surface depends on almost entire dataset
Complexity of nonlinear SSVM ø O((m+ 1)3)
Runs out of memory while storing the kernel matrix
Long CPU time to compute the dense kernel matrix
O(m2) Need to generate and store entries
Need to store the entire dataset even after solving the problem
![Page 33: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/33.jpg)
Reduced Support Vector Machine
K (x0;Aö0)Döuö = í
(ii) Solve the following problem by the Newtonmethod with correspondingD ú D :
2÷kp(eà D(K (A;A0)Döuö à eí );ë)k2
2 + 21kuö; í k2
2min(u; í ) 2 Rm+1
K (x0;Aö0)Döuö = í
(iii) The nonlinear classifier is defined by the optimal(u;í )solution in step (ii):
Using K (A;A0) gives lousy results!
(i) Choose a random subset matrix ofentire data matrixA 2 Rmâ n; (m << m):
A 2 Rmâ n
Nonlinear Classifier:
![Page 34: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/34.jpg)
A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points
in Separate 486 Asterisks from 514
DotsR2
![Page 35: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/35.jpg)
Conventional SVM Result on Checkerboard
Using 50 Randomly Selected Points Out of 1000
K (A;A0) 2 R50â 50
![Page 36: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/36.jpg)
RSVM Result on Checkerboard Using SAME 50 Random Points Out of
1000
K (A;A0) 2 R1000â 50
![Page 37: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/37.jpg)
RSVM on Moderate Sized Problems(Best Test Set Correctness %, CPU
seconds)
Cleveland Heart297 x 13, 30
86.473.04
85.9232.42
76.881.58
BUPA Liver345 x 6 , 35
74.862.68
73.6232.61
68.952.04
Ionosphere 351 x 34, 35
95.195.02
94.3559.88
88.702.13
Pima Indians768 x 8, 50
78.645.72
76.59328.3
57.324.64
Tic-Tac-Toe958 x 9, 96
98.7514.56
98.431033.5
88.248.87
Mushroom8124 x 22, 215
89.04466.20
N/A
N/A
83.90221.50
K (A;A0)mâ m K (A;A0)mâ m K (A;A0)mâ mmâ n; mDataset Size
![Page 38: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/38.jpg)
RSVM on Large UCI Adult Dataset
Standard Deviation over 50 Runs = 0.001
Average Correctness % & Standard Deviation, 50 Runs
(6414, 26148) 84.47 0.001 77.03 0.014 210 3.2%(11221, 21341) 84.71 0.001 75.96 0.016 225 2.0%(16101, 16461) 84.90 0.001 75.45 0.017 242 1.5%(22697, 9865) 85.31 0.001 76.73 0.018 284 1.2%(32562, 16282) 85.07 0.001 76.95 0.013 326 1.0%
Dataset Size( Train ; Test)
UCI AdultK (A;A0)mâ m
Testing%Std.Dev.
Amâ 123
m m=mK (A;A0)mâ m
%Testing Std.Dev.
![Page 39: Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d535503460f94a2f931/html5/thumbnails/39.jpg)
Tim
e( C
PU
sec
. )
Training Set Size
RSVMSMOPCGC