incremental support vector machine classification second siam international conference on data...
TRANSCRIPT
![Page 1: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/1.jpg)
Incremental Support Vector Machine Classification
Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002
Glenn Fung & Olvi Mangasarian
Data Mining Institute
University of Wisconsin - Madison
![Page 2: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/2.jpg)
Key Contributions
Fast incremental classifier based on PSVM Proximal Support Vector Machine
Capable of modifying an existing linear classifier by both adding and retiring data
Extremely simple to implement
Small memory requirement Even for huge problems (1 billion)
NO optimization packages (LP,QP) needed
![Page 3: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/3.jpg)
Outline of Talk
(Standard) Support vector machines (SVM) Classification by halfspaces
Proximal linear support vector machines (PSVM) Classification by proximity to planes
The incremental and decremental algorithm Option of keeping or retiring old data
Numerical results1 Billion points in 10 dimensional space classified in less than 3 hours! Numerical results confirm that algorithm time is linear in the number of data points
![Page 4: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/4.jpg)
Support Vector MachinesMaximizing the Margin between Bounding
Planes
x0w = í + 1
x0w = í à 1
A+
A-
jjwjj22
w
![Page 5: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/5.jpg)
Proximal Support Vector MachinesFitting the Data using two parallel
Bounding Planes
x0w = í + 1
x0w = í à 1
A+
A-
jjwjj22
w
![Page 6: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/6.jpg)
Standard Support Vector MachineAlgebra of 2-Category Linearly Separable Case
Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by:A i
An m-by-m diagonal matrix D with +1 & -1 entries
D(Awà eí )=e;
More succinctly:
where e is a vector of ones.
x0w = í æ1: Separate by two bounding planes,
A iw=í + 1; for D i i = + 1;A iw5í à 1; for D i i = à 1:
![Page 7: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/7.jpg)
Standard Support Vector Machine Formulation
Margin is maximized by minimizing21kw;í k2
2
÷> 0 Solve the quadratic program for some :
2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y > ey;w;ímin
s. t.(QP)
,
, denoteswhere D ii = æ1 A+ Aàor membership.
![Page 8: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/8.jpg)
PSVM Formulation
We have from the standard QP SVM formulation:
w;í (QP)2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y
min
s. t. = e=
This simple, but critical modification, changes the nature of the optimization problem tremendously!!
Solving for in terms of and gives:
minw;í 2
÷keà D(Awà eí )k22 + 2
1kw; í k22
y w í
![Page 9: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/9.jpg)
Advantages of New Formulation
Objective function remains strongly convex.
An explicit exact solution can be written in terms of the problem data.
PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space.
Exact leave-one-out-correctness can be obtained in terms of problem data.
![Page 10: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/10.jpg)
Linear PSVM
We want to solve:
w;ímin
2÷keà D(Awà eí )k2
2 + 21kw; í k2
2
Setting the gradient equal to zero, gives a nonsingular system of linear equations.
Solution of the system gives the desired PSVM classifier.
![Page 11: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/11.jpg)
Linear PSVM Solution
H = [A à e]Here,
íw
h i= (÷
I + H 0H)à 1H 0De
The linear system to solve depends on:
H 0H(n + 1) â (n + 1)which is of size
is usually much smaller than n m
![Page 12: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/12.jpg)
Linear Proximal SVM Algorithm
Classifier: sign(w0x à í )
Input A;D
Define H = [A à e]
Solve (÷I + H 0H) í
wh i
= v
v = H0DeCalculate
![Page 13: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/13.jpg)
Linear & Nonlinear PSVM MATLAB Code
function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification% INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r
![Page 14: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/14.jpg)
Incremental PSVM Classification
E = A1 à eA2 à e
ô õ) E = E 1
E 2
ô õ
) E0E = E1
E2
ô õ0
E1 E2[ ]= E01E1 + E0
2E2
íw
h i= (÷
I + E01E1+ E0
2E2)à 1(E0
1D1e+ E0
2D2e)
The linear system to solve depends on the compressed blocks:
(n + 1) â (n + 1)which are of the size
E01E1; E0
2E2
A1 2 Rm1â n A2 2 Rm2â nand
Suppose we have two “blocks” of data
![Page 15: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/15.jpg)
Linear Incremental Proximal SVM Algorithm
InitializationE 0E = 0;d = 0;i = 1
A i; di Read from disk
E i 0
E i
di = E i 0
D ie(n + 1) â (n + 1)
(n + 1) â 1
Compute andStore in memory
i = imax?
i = i + 1
YesCompute output
w; í
E 0E = E 0E + E i 0
E i
d = d+ di
Update in memory
No
Discard:
Keep:
A i;D i;E i;di
E 0E ;d
![Page 16: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/16.jpg)
Linear Incremental Proximal SVM Adding – Retiring Data
Capable of modifying an existing linear classifier by both adding and retiring data
Option of retiring old data is similar to adding new data
Financial Data: old data is obsolete
Option of keeping old data and merging it with the new data:
Medical Data: old data does not obsolesce.
![Page 17: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/17.jpg)
Numerical experimentsOne-Billion Two-Class Dataset
Synthetic dataset consisting of 1 billion points in 10- dimensional input space Generated by NDC (Normally Distributed Clustered) dataset generatorDataset divided into 500 blocks of 2 million points each.Solution obtained in less than 2 hours and 26 minutes About 30% of the time was spent reading data from disk.Testing set Correctness 90.79%
![Page 18: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/18.jpg)
Numerical Experiments Simulation of Two-month 60-Million Dataset
Synthetic dataset consisting of 60 million points (1 million per day) in 10- dimensional input space Generated using NDC At the beginning, we only have data corresponding to the first month Every day:
The oldest block of data is retired (1 Million) A new block is added (1 Million) A new linear classifier is calculated daily
Only an 11 by 11 matrix is kept in memory at the end of each day. All other data is purged.
![Page 19: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/19.jpg)
Numerical experimentsSeparator changing through time
![Page 20: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/20.jpg)
Numerical experiments Normals to the separating hyperplanes
Corresponding to 5 day intervals
![Page 21: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/21.jpg)
Conclusion
Proposed algorithm is an extremely simple procedure for generating linear classifiers in an incremental fashion for huge datasets. The linear classifier is obtained by solving a single system of linear equations in the small dimensional input space. The proposed algorithm has the ability to retire old data and add new data in a very simple manner. Only a matrix of the size of the input space is kept in memory at any time
![Page 22: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung](https://reader036.vdocument.in/reader036/viewer/2022081519/56649e185503460f94b04773/html5/thumbnails/22.jpg)
Future Work
Extension to nonlinear classification
Parallel formulation and implementation on remotely located servers for massive datasets
Real time on-line application, e.g. fraud detection