support vector machines (svm): recent research · 2020. 11. 23. · outlier point observed function...

$: Support Vector Machines (SVM): Recent Research · 2020. 11. 23. · Outlier point Observed function RL-NPSVR TSVR RLTSVR. 𝑓𝑥= sin𝑥 𝑥,𝑥∈ −4𝜋,4𝜋\0 𝑓𝑥=sin$
Support Vector Machines (SVM):

Recent Research

Panos M. Pardalos

www.ise.ufl.edu/pardalos

https://nnov.hse.ru/en/latna/

Winter School on Data Analytics (Nov 20-22, 2020, HSE)

Classification and Clustering in

Data Analysis

Classification (supervised learning) uses predefined

classes in which objects are assigned, while clustering

(unsupervised learning) identifies similarities between

objects, which it groups according to those

characteristics in common and which differentiate

them from other groups of objects. These groups are

known as "clusters".

2020/11/23

2

Applications of Classification

Algorithms

Speech recognition

Face recognition

Handwriting recognition

Biometric identification

Document classification

Fraud detection in finance

Biomedicine

2020/11/23

3

Classification Algorithms

Neural Networks

Random Forest

Decision Trees

Nearest Neighbor

Boosted Trees

Linear Classifiers: Logistic Regression, Naïve Bayes

Classifier

Support Vector Machines

2020/11/23

4

Fuzzy approaches to classification

Ducange, P., Fazzolari, M. & Marcelloni, F. An overview of

recent distributed algorithms for learning fuzzy models in

Big Data classification. J Big Data 7, 19

(2020). https://doi.org/10.1186/s40537-020-00298-6

2020/11/23

5

https://doi.org/10.1186/s40537-020-00298-6

Quantum approaches to classification

Is Quantum Machine Learning the next thing?

https://medium.com/illumination-curated/is-quantum-

machine-learning-the-next-thing-6328b594f424

Quantum Machine Learning Is The Next Big Thing

https://thequantumdaily.com/2020/05/28/quantum-

machine-learning-is-the-next-big-thing/

Daniel K. Park, Carsten Blank, Francesco Petruccione,

The theory of the quantum kernel-based binary

classifier, Physics Letters A, Volume 384, Issue 21, 2020, 126422

2020/11/23

6

https://medium.com/illumination-curated/is-quantum-machine-learning-the-next-thing-6328b594f424


https://thequantumdaily.com/2020/05/28/quantum-machine-learning-is-the-next-big-thing/


Quantum approaches to classification

2020/11/23

7

Complexity of Classification

Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann,

Marcilio C. P. Souto, and Tin Kam Ho. 2019. How

Complex Is Your Classification Problem?: A Survey on

Measuring Classification Complexity. ACM Comput.

Surv. 52, 5, Article 107 (September 2019), 34 pages.

https://doi.org/10.1145/3347711

Each measure provides a distinct perspective on

classification complexity, a combination of different

measures is advised. Nonetheless, whether there is a

subset of the complexity measures that can be

considered core to stress the difficulty of problems

from different application domains is still an open

issue.

2020/11/23

8

https://doi.org/10.1145/3347711

What about clustering?

A density-based statistical analysis of graph clustering

algorithm performance

Pierre Miasnikof, Alexander Y Shestopaloff, Anthony J

Bonner, Yuri Lawryshyn, Panos M Pardalos

Journal of Complex Networks, Volume 8, Issue 3, June

2020, cnaa012, https://doi.org/10.1093/comnet/cnaa012

2020/11/23

9

https://doi.org/10.1093/comnet/cnaa012

Complexity measures

(1) Feature-based measures, which characterize how informative the available features are to separate the classes;

(2) Linearity measures, which try to quantify whether the classes can be linearly separated;

(3) Neighborhood measures, which characterize the presence and density of same or different classes in local neighborhoods;

(4) Network measures, which extract structural information from the dataset by modeling it as a graph;

(5) Dimensionality measures, which evaluate data sparsity based on the number of samples relative to the data dimensionality;

(6) Class imbalance measures, which consider the ratio of the

numbers of examples between classes.

2020/11/23

10

2020/11/23

11Clustering and Classification (P Arabie, L J Hubert, and G De Soete

https://doi.org/10.1142/1930 | January 1996)

https://www.worldscientific.com/author/Arabie%2C+P

https://www.worldscientific.com/author/Hubert%2C+L+J

https://www.worldscientific.com/author/de+Soete%2C+G

https://doi.org/10.1142/1930

Any issues with data

analysis?

2020/11/23

12

https://medium.com/dataseries/five-machine-learning-paradoxes-that-will-change-the-way-you-think-about-data-3b82513482b8

Five Machine Learning Paradoxes that will Change

the Way You Think About Data

Machine Learning Paradoxes



Basic Support Vector Machines (SVM)

2020/11/23

13

Twin support vector machines

2020/11/23

14

Many Models of SVM

2020/11/23

15

Wang, X., Pardalos, P.M. A Survey of Support Vector Machines with Uncertainties. Ann. Data. Sci. 1, 293–309 (2014). https://doi.org/10.1007/s40745-014-0022-8

Explosive research on svm

2020/11/23

16

Kernels - see e.g.

https://www.educba.com/kernel-methods/

2020/11/23

17

Nonparallel support vector

regression

Structural risk minimization(SRM) principle. The SRM

principle addresses overfitting by balancing the

model's complexity against its success at fitting the

training data. This principle was first set out in a 1974

paper by Vladimir Vapnik and Alexey Chervonenkis

Sparsity of the model (number of support vectors). The

decision functions constructed by support vector

machines usually depend only on a subset of the

training set—the so-called support vectors

2020/11/23

18

https://en.wikipedia.org/wiki/Vladimir_Vapnik

https://en.wikipedia.org/wiki/Alexey_Chervonenkis

Nonparallel support vector regression

Primal problem

2020/11/23

19

Lower

bound

Upper

bound

minw1,𝑏1,𝜼1,𝜼1

∗ ,𝝃1

1

2w1𝑇w1 + 𝐶1 𝜼1 1 + 𝜼1

∗1 + 𝐶3 𝝃1 1

s.t. y− e𝜀1 − Aw1 + e𝑏1 ≤ 𝜼1 + e𝜀−y+ e𝜀1 + Aw1 + e𝑏1 ≤ 𝜼1

∗ + e𝜀y− Aw1 + e𝑏1 ≥ e𝜀1 − 𝝃1

𝜼1, 𝜼1∗ , 𝝃1 ≥ 0

minw2,𝑏2,𝜼2,𝜼2

∗ ,𝝃2

1

2w2𝑇w2 + 𝐶2 𝜼2 1 + 𝜼2

∗1 + 𝐶4 𝝃2 1

s.t. y+ e𝜀2 − Aw2 + e𝑏2 ≤ 𝜼2 + e𝜀−y− e𝜀2 + Aw2 + e𝑏2 ≤ 𝜼2

∗ + e𝜀Aw2 + e𝑏2 − y ≥ e𝜀2 − 𝝃2

𝜼2, 𝜼2∗ , 𝝃2 ≥ 0

y

x

( )1 1f +x

( )1 1f + +x

( )1 1f + −x

( )1f x

y

x

( )2 2f −x

( )2 2f − +x

( )2 2f − −x

( )2f x

2020/11/23

20 NPSVR

Advantages of NPSVR

Equivalent sparseness to the standard SVR;

Does not involve computing inverse matrix;

Same formulation as the standard SVR. An

SMO-type solver can be developed to

accelerate the training process;

2020/11/23

21

-15 -10 -5 0 5 10 15-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2Up-bound function

f2(x)

f2(x)-

2

f2(x)-

2-

training sample

support vector

0 50 100 150 200 250 300 350 400-1

0

1

2

3

4

5

6

7

iteration

Z1(down-bound function)

Z2(up-bound function)

0 50 100 150 200-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

iteration

Z1(down-bound function)

Z2(up-bound function)

Convergence of

SMO-type solver

-15 -10 -5 0 5 10 15-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2Down-bound function

f1(x)

f1(x)+

1

f1(x)+

1+

training sample

support vector

2020/11/23

21NPSVRSparseness

2020/11/23

22

NPSVR

1000 1500 2000 2500 3000 3500 4000 4500 50000

50

100

150

200

250

300

350

400

Training size

Training time (s)

NPSVR

TSVR

RLTSVR

L1-TWSVR

SVR

Training speed test of

large-scale data sets

Accuracy test of UCI data sets

Tang Long, Tian Yingjie*, Yang Chunyan.

Nonparallel support vector regression

and its SMO-type solver. Neural networks,

2018, 105: 431-446.

Ramp loss function based nonparallel

support vector regression (RL-NPSVR)

A Ramp ɛ -insensitive loss function is constructed to

compel as many training samples as possible to locate the down (up) bound hyperplane within a 2ɛ -wide

band

A Ramp loss function is constructed to keep as many

training samples as possible above (below) the down

(up) bound hyperplane

A regularized term is added into each primal problem

by rigidly following the SRM principle

Trading Convexity for Scalability

2020/11/23

23

Ramp-loss NPSVR Compared to the existing TSVRs, our proposed RL-NPSVR

has the following merits:

(1) It can explicitly filter noise and outlier suppression in

the training process

(2) RL-NPSVR has inherent sparseness as the standard

SVR, and the adopted Ramp-type loss functions make

it sparser

(3) The dual of each reconstructed convex

optimization problem has the same formulation as that

of the standard SVR, so computing inverse matrix is

avoided and the kernel trick can be directly applied

to the nonlinear case

(4) Available SMO-type fast algorithm exists to solve

this problem

2020/11/23

24

Original loss function is sensitive to the outlier data,

limiting the generalization ability.

Ramp-loss is adopted to improve the robustness of the

model to outlier data.

2020/11/23

25Ramp-loss NPSVR

2020/11/23

26

Dual problem

s.t.

min෩α1, ഥα1,

ഥβ1

1

2α1 − ഥα1 − ഥβ1

𝑇𝐀𝐀𝑇 α1 − ഥα1 − ഥβ1

− α1 − ഥα1 − ഥβ1𝑇y+ α1 + ഥα1

𝑇𝐞𝜀

α1 − ഥα1 − ഥβ1𝑇𝐞 = 𝟎

−෨θ1𝑡≤α1≤𝐶1e− ෨θ1

𝑡

തθ1𝑡≤ ഥα1≤𝐶1e+തθ1

𝑡

δ1𝑡≤ ഥβ1≤𝐶3e+δ1

𝑡

Ramp-loss NPSVR

SMO-type solver of NPSVR can be

used to solve each sub-optimization.

Non-convexity：CCCP (concave–

convex programming)

2020/11/23

27

-15 -10 -5 0 5 10 15-1.5

-1

-0.5

0

0.5

1

1.5

Outlier point

Observed function

RL-NPSVR

TSVR

RLTSVR

Ramp-loss NPSVR

0 1 2 3 4 5 6 7 8 9 10-5

-4

-3

-2

-1

0

1

2

3

4

5

Outlier point

Observed function

RL-NPSVR

TSVR

RLTSVR

𝑓 𝑥 =sin 𝑥

𝑥, 𝑥 ∈ −4𝜋, 4𝜋 \ 0

𝑓 𝑥 = sin9𝜋

0.35𝑥 + 1, 𝑥 ∈ 0,10

Stochastically generate

200 training points， in

which 5% of them are set

to outlier points。

Capacity of filtering outlier data

2020/11/23

28Ramp-loss NPSVR


2020/11/23

29Ramp-loss NPSVR

Tang Long, Tian Yingjie, Pardalos P. M*,

Yang Chunyan. Ramp-loss nonparallel

support vector regression: robust, sparse

and scalable approximation. Knowledge-

based systems, 2018, 147: 55-67

1000 1500 2000 2500 3000 3500 4000 4500 50000

50

100

150

200

250

300

350

400

450

Training size

train

ing t

ime (

s)

Ts-total

RL-NPSVR

TSVR

RLTSVR

1000 1500 2000 2500 3000 3500 4000 4500 50000

50

100

150

200

250

300

350

400

450

Training size

train

ing t

ime (

s)

Ttotal

RL-NPSVR

TSVR

RLTSVR

Training speed test of large-

scale data sets

Regular simplex support vector machine

(RSSVM) for the K -class classification

RSSVM maps the K classes to K vertices of a (K−1) -

dimensional regular simplex so that the K-class

classification becomes a (K-1)-output learning task

We measure the training loss by comparing the square

of the distance between the output point of each

sample and its vertices

Adding an appropriate regularized term to the primal

problem, makes the dual problem a quadratic

programming problem, and we developed an

exclusive sequential minimization optimization–type

solver to accelerate our ability to solve it

2020/11/23

30

2020/11/23

31 Regular simplex SVM for multi-classification

Limitations of traditional Partitioning one-

versus-one (1-v-1), one-versus-

rest (1-v-r) strategies

Establish multiple sub-binary classifiers,

limiting the sparseness of the model

Lack of definite classifying boundaries

Individual classifier can hardly use complete

information of training samples

Primal problem

2020/11/23

32

(0,0)T

(1,0)T

(0.5,0.866)T

V1 V2

V3

V1

V2

V3

(0,0,0)T

(1,0,0)T

(0.5,0.866,0)T

V1 V2

V3

V1

V2

V3

V4

(0.5,0.2887,0.8165)TV4

minw,b

𝑗=1

𝐾−11

2w𝑗𝑇w𝑗 + 𝑏𝑗

2 + 𝐶

𝑖=1

𝑁

𝑘≠𝑐𝑖

𝜉𝑖,𝑘

s.t. σ𝑗=1𝐾−1 2 𝑉𝑐𝑖,𝑗 − 𝑉𝑘,𝑗 w𝑗

𝑇x𝑖 + 𝑏𝑗 + 𝑉𝑘,𝑗2 − 𝑉𝑐𝑖,𝑗

2 ≥ 𝜀 − 𝜉𝑖,𝑘 , 𝑖 = 1,2,⋯𝑁

𝜉𝑖,𝑘 ≥ 0, 𝑖 = 1,2,⋯𝑁

RSSVM

The classes are mapped to different vertices of a

regular simplex, and square distance is used to measure

the loss.

2020/11/23

33

Advantages of RSSVM

The primal includes only a single

optimization problem.

The adapted loss function preserves

equivalent sparseness of the original

SVM in the RSSVM.

Matched SMO-type solver can be

developed for training.

minα

1

2ෝα𝑇

𝑗=1

𝐾−1

E𝑗 𝐀𝐀𝑇 + ee𝑇 E𝑗

𝑇 ෝα − ෝα𝑇

𝑗=1

𝐾−1

𝐅𝑗 + 𝜀e

s.t. 0≤ෝα≤𝐶e

Dual problem

RSSVM

2020/11/23

34

Classifying

mode

2020/11/23

34

2020/11/23

35

Tang Long, Tian Yingjie, Pardalos P. M*. A

novel perspective on multiclass

classification: regular simplex support

vector machine. Information sciences,

2019, 480: 324-338.

RSSVM

The developed SMO-type solver has

excellent scalability.

Training speed test of

large-scale data sets


2020/11/23

36 Shortcomings of directly combining the partitioning (1-v-1, 1-v-

r) strategies and RSSVM.

Repeatedly computing the clustering information matrices

under different partitions increases the training time.

Individual classifier can hardly use complete information of

training samples.

Structural improved RSSVM

All-in-one multi-classification

model

Embedding the cluster

granularity into binary-

classification SVM

RSSVM

SRSVM

SIRSSVM

2020/11/23

37minw,b

𝑗=1

𝐾−11

2w𝑗𝑇w𝑗 + 𝑏𝑗

2 + 𝑑1

𝑖=1

𝑁

𝐾 − 1 𝜉𝑖 +

𝑗=1

𝐾−1𝑑22w𝑗𝑇Σw𝑗

s.t. σ𝑗=1𝐾−1 2 𝑉𝑐𝑖,𝑗 − 𝑉𝑘,𝑗 w𝑗

𝑇x𝑖 + 𝑏𝑗 + 𝑉𝑘,𝑗2 − 𝑉𝑐𝑖,𝑗

2 ≥ 𝜀 −

𝜉𝑖,𝑘, 𝑖 = 1,2,⋯𝑁𝜉𝑖,𝑘 ≥ 0, 𝑖 = 1,2,⋯𝑁

Compute complete cluster

information matrix

Primal problem

Improved SMO-type solver


2020/11/23

38

2020/11/23

38

Convergence process

Accuracy test

2020/11/23

39

Comparison of training speed SIRSSVM has better convergence than

RSSVM.

Long Tang; Yingjie Tian; Wenjun Li; Panos

M. Pardalos*; Structural improved regular

simplex support vector machine for

multiclass classification, Applied soft

computing, 2020, 91,

https://doi.org/10.1016/j.asoc.2020.106235.


Challenging issues with SVM

Unbalanced data

Structural data sets

Multi-label classification

Semi-supervised learning

Massive data sets

Jair Cervantes, Farid Garcia-Lamont, Lisbeth Rodríguez-

Mazahua, Asdrubal Lopez, A comprehensive survey on

support vector machine classification: Applications,

challenges and trends, Neurocomputing, Volume 408,

2020, Pages 189-21

https://www.sciencedirect.com/science/article/pii/S0925

231220307153

2020/11/23

40

2020/11/23

41

Thank you!

support vector machines (svm): recent research · 2020. 11. 23. · outlier point observed function...

Documents