liang tian- learning from data through support vector machines

Learning From Data ThroughLearning From Data ThroughSupport Vector MachinesSupport Vector Machines

Liang Tian

tian@csee.wvu.edu

Lane Department of Computer Science& Electrical Engineering

West Virginia University

November 16, 2004November 16, 2004

CPE 520 CPE 520 –– Neural Networks Neural Networks

Neural Networks Neural Networks –– BP Learning BP Learning

P. Klinkhachorn. CpE520 Lecture Notes, CSEE Dept, West Virginia University.

BP Learning ProcedureBP Learning Procedure

ClassifierClassifier

A. Moore. Lecture Notes, School of Computer Science, CMU, http://www.cs.cmu.edu/~awm/tutorials.

Margin

ßß MLP MLP stop training when all points arestop training when all points arecorrectly classifiedcorrectly classified

ßß Decision surface may Decision surface may notnot be optimized be optimized

ßß Generalization error may Generalization error may notnot be minimized be minimized

Local MinimaLocal Minima

S. Bengio. An Introduction to Statistical Machine Learning – Neural Networks.IDIAP. Available at http://www.idiap.ch/~bengio May. 2003.

MLP MLP –– gradient descent learning gradient descent learning –– non linear optimization non linear optimization –– local minima local minima

SVM ClassificationSVM Classification

R. Collobert. An Introduction to Statistical Machine Learning – Support Vector Machines.IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.

Margin Maximization

Correct Classification

ßß Classic n Classic non-linear on-linear optimization optimization problemproblemwith inequality constraintswith inequality constraints

ßß Solved by maximizing the dual variables Solved by maximizing the dual variables

Lagrange functionLagrange function

Subject to constraints

ßß Solutions of Lagrange Multipliers Solutions of Lagrange Multipliers ααii willwilldetermine parameters determine parameters ww and and bb

ßß The final decision The final decision hyperplane hyperplane is anis an

indicator functionindicator function

Similar to weighted sum in MLP

If it is non linear, itIf it is non linear, it’’s easier to separate two classes bys easier to separate two classes by

projecting data into projecting data into higher dimensionalhigher dimensional space. space.

y = Outputy = S wi f(xi) +by = w fT(x)xi

InputSpace

fi (x)

b= w0 = bias

Z-Space

Introducing Introducing KernelKernel Functions to simplify the computation Functions to simplify the computation

Problem?Computationally discouraging if dimensionality of Computationally discouraging if dimensionality of Z-SpaceZ-Space is very large is very large

Common Kernels areCommon Kernels are polynomial polynomial andand GaussianGaussian..

KernelKernel Function is in Function is in input spaceinput space

Bypass the high dimensionality of feature spaceBypass the high dimensionality of feature space

SVM Learning ExampleSVM Learning Example

Classic XOR Problem

SVM Learning ExampleSVM Learning Example

polynomial polynomial Kernel function KKernel function K

SVM Learning ProcedureSVM Learning Procedure

Step 1: Step 1: Select the kernel functionSelect the kernel function

Step 2: Step 2: Present inputs and desired outputsPresent inputs and desired outputs

Step 3: Step 3: SolveSolve Lagrange MultipliersLagrange Multipliers ααi i throughthrough an optimization probleman optimization problem

Step 4: Step 4: Obtain decision indicator functionObtain decision indicator function

BP Learning ProcedureBP Learning Procedure

SVM vs. NNSVM vs. NN

V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.

This is a NNThis is a NN

This is a SVMThis is a SVM

There is There is NO NO difference between structuredifference between structure

HOWEVERHOWEVER

ImportantImportant difference in difference in LEARNING !LEARNING !

[1] V. Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995. ISBN: 0-387-94559-8.

ßß SVM is a novel type of machine learning SVM is a novel type of machine learningalgorithm developed by V. algorithm developed by V. VapnikVapnik..

ßß SVM minimizes an upper bound on the SVM minimizes an upper bound on thegeneralizationgeneralization error. error.

ßß Conventional neural networks only minimize Conventional neural networks only minimize

the error on the training data.the error on the training data.

ßß A unique and global solution and avoid being A unique and global solution and avoid beingtrapped at local minimatrapped at local minima

SVM ApplicationsSVM Applications

Muller et al. “An introduction to kernel-based learning algorithms”, IEEE Trans. NN, 12(2), 2001, pp.181-201.

OCROCR

0.6%0.6%

Muller et al. “An introduction to kernel-based learning algorithms”, IEEE Trans. NN, 12(2), 2001, pp.181-201.

DNA Data AnalysisDNA Data Analysis

Tax and Duin, “Outliers and data descriptions”, Proceedings of the 7th Annual Conference of the Advanced School for Computing and Imaging, 2001. Pp. 234-241.

Single-Class ClassificationSingle-Class Classification

Two Types of ProblemsTwo Types of Problems

RegressionRegression ClassificationClassification

S. Bengio. An Introduction to Statistical Machine Learning – Neural Networks.IDIAP. Available at http://www.idiap.ch/~bengio May. 2003.

SVM RegressionSVM Regression

Approximating the set of data of Approximating the set of data of ll pair of training patternpair of training pattern

The SVM model used for function approximation is:The SVM model used for function approximation is:

where where ΦΦ(x)(x) is the high-dimensional feature space is the high-dimensional feature space

that is nonlinearly mapped from the input space x.that is nonlinearly mapped from the input space x.

L. Tian and A. Noore, “A novel approach for short-term load forecasting using support vector machines,” International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

ww and and bb can be estimated by minimizing the can be estimated by minimizing thefollowing regularized risk functionfollowing regularized risk function

Vapnik'sVapnik's linear loss function linear loss functionwith with εε-insensitivity zone-insensitivity zone

is the weights vector norm, which is used tois the weights vector norm, which is used toconstrain the model structure capacity in orderconstrain the model structure capacity in orderto obtain better generalization performance.to obtain better generalization performance.

CC is the regularization constant, representing is the regularization constant, representingthe trade-off between the approximation errorthe trade-off between the approximation errorand the model structure.and the model structure.

Minimizing risk objective function R Minimizing risk objective function R

Then, the solution is given in the form: Then, the solution is given in the form:

Training examples with (Training examples with (ααi i –– ααii*) *) ≠≠ 0 are 0 are support vectorssupport vectors

Lagrange Multipliers can be obtained Lagrange Multipliers can be obtained by maximizing the form:by maximizing the form:

Regression Application - 1Regression Application - 1Short-Term Load ForecastingShort-Term Load Forecasting

Regression Application - 2Regression Application - 2Software Reliability PredictionSoftware Reliability Prediction

Average Error = 1.20%Average Error = 1.20%

L. Tian and A. Noore, “On-line software reliability prediction: An approach based on support vector machines,” International Journal of Reliability, Quality and Safety Engineering, submitted and under revision.

Regression Application - 2Regression Application - 2Software Reliability PredictionSoftware Reliability Prediction

Parameter SelectionParameter Selection

Cao and Tay, “Support vector machine with adaptive parameters in financial time series forecasting”, IEEE Trans. NN, 14(6), Nov. 2003, pp. 1506-1518.

ßß Both NN and SVM learn from experimental data Both NN and SVM learn from experimental data

ßß Both NN and SVM are universal Both NN and SVM are universal approximatorapproximator

ßß After learning, both NN and SVM have sameAfter learning, both NN and SVM have same

mathematical model, graphical representationmathematical model, graphical representation

ßß The only The only differencedifference is the is the learning methodlearning method..

ßß NN NN –– gradient descent gradient descent

ßß SVM SVM –– solving quadratic programming solving quadratic programming

SummarySummary

SVM Research IssuesSVM Research Issues

ßß Speeding up learning time when data is large. Speeding up learning time when data is large.

ßß Chunking, using subset of data Chunking, using subset of data

ßß Optimization techniques improvement Optimization techniques improvement

ßß Parameter selection and optimization Parameter selection and optimization

ßß Modified and adaptive SVM and some variations Modified and adaptive SVM and some variations

References and Further ReadingReferences and Further Reading

[1] V. Vapnik.The Nature of Statistical Learning Theory.Springer, N.Y., 1995. ISBN: 0-387-94559-8.

[2] S. Bengio.An Introduction to Statistical Machine Learning –Neural Networks.IDIAP. Available at http://www.idiap.ch/~bengioMay. 2003.

[3] V. Kecman.Learning and Soft Computing.MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.

[4] R. Collobert.An Introduction to Statistical Machine Learning –Support Vector MachinesIDIAP. Available at http://www.idiap.ch/~colloberJan. 2003.

[5] L. Tian, A. Noore.A Novel Approach for Short-Term Load ForecastingUsing Support Vector Machines.International Journal of Neural Systems.International Journal of Neural Systems.Vol. 14, No. 5, Oct. 2004.

[6] L. Tian, A. Noore.On-line Software Reliability Prediction: An ApproachBased on Support Vector Machines.International Journal of Reliability, Quality andInternational Journal of Reliability, Quality andSafety Engineering.Safety Engineering.Submitted and under revision.

[7] V. Vapnik.Statistical Learning Theory.John Wiley & Sons, 1998, ISBN: 0-471-03003-1.

[8] http://www.kernel-machines

[9] http://www.support-vector.ws

Thank You !!Thank You !!

Questions and Comments ?Questions and Comments ?

liang tian- learning from data through support vector machines

term load

west virginia

support vector

cpe520 lecture

statistical

bp learning

based learning

ch collober

Documents

singapore schools sports council€¦ · singapore schools...

piano dance revolution -...

pressacademia jbefthe impact of managerial stock option on...

tian tian to cite this version - accueil - tel

xin tian, phd

tian belawati for

leveled book i tian tian,

tian mcn 2010

suppression of gastric cancer growth by baculovirus...

la tian true

geochemical and geochronologic constraints for paleozoic...

tian de offers_winter_2017_by

song liang - egh.phhp.ufl.edu file · web viewsong liang -...

liang liang and mark schwartz u. wisconsin milwaukee

teaching portfolio tian

liang iedm

evaluating corporate bonds with complex debt … annual...

tian de offers_winter_2017_kz

catalog tian-de.ro

system approach to a raster-to-vector conversion ·...