trading convexity for scalability
DESCRIPTION
Marco A. Alvarez CS7680 Department of Computer Science Utah State University. Trading Convexity for Scalability. Paper. - PowerPoint PPT PresentationTRANSCRIPT
Trading Convexity for Scalability
Marco A. AlvarezCS7680
Department of Computer ScienceUtah State University
Paper Collobert, R., Sinz, F., Weston, J., and Bottou, L.
2006. Trading convexity for scalability. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, June 25 - 29, 2006). ICML '06, vol. 148. ACM Press, New York, NY, 201-208.
Introduction Previously in Machine Learning
Non-convex cost function in MLP Difficult to optimize Work efficiently
SVM are defined by a convex function Easier optimization (algorithms) Unique solution (we can write theorems)
Goal of the paper Sometimes non-convexity has benefits
Faster == training and testing (less support vectors) Non-convex SVMs (faster and sparser) Fast transductive SVMs
From SVM Decision function
Primal formulation
Minimize ||w|| so that margin is maximized w is a combination of a small number of data (sparsity) Decision boundary is determined by the support vectors
Dual formulation
y=w⋅x b
minw,b
12∥w∥2C⋅∑
iH1[ y i⋅y x i]
min
G =12∑i , j i jx i x j−∑
iyii
s.t. ∑i
i=0
0 y i iC
SVM problem Number of support vectors increases linearly with L Cost attributed to one example (x,y):
From:
C H 1 [ y y x ]
Ramp Loss Function Given: z= y y x Outliers
Non SV
R s z =H 1 z −H s z
Concave-Convex Procedure (CCCP) Given a cost function: Decompose into a convex part and a concave part
Is guaranteed to decrease at each iteration
J
J = J VEX J CAV
Using the Ramp Loss
CCCP for Ramp Loss
Results
Speedup
Time and Number of SVs
Transductive SVMs
Loss Function Cost to be minimized:
Balancing Constraint Necessary for TSVMs
Results
Training Time
Quadratic Fit