basis expansion and regularization presenter: hongliang fei brian quanz brian quanz date: july 03,...

Basis Expansion and Basis Expansion and RegularizationRegularization

Presenter: Hongliang Fei Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008Date: July 03, 2008

ContentsContents

IntroductionIntroduction Piecewise Polynomials and SplinesPiecewise Polynomials and Splines Filtering and Feature ExtractionFiltering and Feature Extraction Smoothing SplinesSmoothing Splines Automatic Smoothing parameter seAutomatic Smoothing parameter se

lectionlection

1. Introduction1. Introduction

Basis: In Linear Algebra, a basis is a Basis: In Linear Algebra, a basis is a set of vectors satisfying:set of vectors satisfying:

Linear combination of the basis can Linear combination of the basis can represent every vector in a given represent every vector in a given vector space;vector space;

No element of the set can be No element of the set can be represented as a linear combination represented as a linear combination of the others. of the others.

In Function Space, Basis is In Function Space, Basis is degenerated to a set of basis degenerated to a set of basis functions;functions;

Each function in the function space Each function in the function space can be represented as a linear can be represented as a linear combination of the basis functions.combination of the basis functions.

Example: Quadratic Polynomial bases Example: Quadratic Polynomial bases {1,t,t^2}{1,t,t^2}

What is Basis Expansion?What is Basis Expansion?

Given data X and transformation Given data X and transformation

Then we modelThen we model

as a linear basis expansion in X, where as a linear basis expansion in X, where

is a basis function.is a basis function.

( ) : , 1,..., .pmh X m M

( )mh X

Why Basis Expansion?Why Basis Expansion?

In regression problems, f(X) will typically In regression problems, f(X) will typically nonlinear in X;nonlinear in X;

Linear model is convenient and easy to iLinear model is convenient and easy to interpret;nterpret;

When sample size is very small but attribWhen sample size is very small but attribute size is very large, Linear model is all ute size is very large, Linear model is all what we can do to avoid over fitting.what we can do to avoid over fitting.

2. Piecewise Polynomials and Splin2. Piecewise Polynomials and Splineses

Spline:Spline: In Mathematics, a spline is a special function defined In Mathematics, a spline is a special function defined

piecewise by polynomials;piecewise by polynomials; In Computer Science, the term spline more frequently In Computer Science, the term spline more frequently

refers to a piecewise polynomial (parametric) curve. refers to a piecewise polynomial (parametric) curve. Simple construction, ease and accuracy of evaSimple construction, ease and accuracy of eva

luation, capacity to approximate complex shaluation, capacity to approximate complex shapes through curve fitting and interactive curve pes through curve fitting and interactive curve design.design.

Example of a SplineExample of a Spline

http://en.wikipedia.org/wiki/Image:BezierInterpolation.gif

Assume four knots spline (two boundary Assume four knots spline (two boundary knots and two interior knots), also X is oknots and two interior knots), also X is one dimensional.ne dimensional.

Piecewise constant basis: Piecewise constant basis:

Piecewise Linear Basis:Piecewise Linear Basis:

Piecewise Cubic PolynomialPiecewise Cubic Polynomial

Basis functions:Basis functions:

Six functions corresponding to a six-Six functions corresponding to a six-dimensional linear space.dimensional linear space.

An M-order spline with knots An M-order spline with knots has continuous derivatives up to order has continuous derivatives up to order M-2. The general form for truncated-poM-2. The general form for truncated-power basis set would be:wer basis set would be:

, 1,...,j j K

Natural cubic SplineNatural cubic Spline

A natural cubic spline adds additional coA natural cubic spline adds additional constrains: function is linear beyond the bnstrains: function is linear beyond the boundary knots.oundary knots.

A natural cubic spline with K knots is repA natural cubic spline with K knots is represented by K basis functions.resented by K basis functions.

One can start from a basis for cubic spliOne can start from a basis for cubic splines, and derive the reduced basis by imnes, and derive the reduced basis by imposing boundary constraints.posing boundary constraints.

Example of Natural cubic splineExample of Natural cubic spline

Starting from the truncated power Starting from the truncated power series basis, we arrive at:series basis, we arrive at:

Where Where

An example of application (Phoneme An example of application (Phoneme Recognition)Recognition)

Data:1000 samples drawn from 695 “aData:1000 samples drawn from 695 “aa”s and 1022 “ao”s, with a feature vea”s and 1022 “ao”s, with a feature vector of length 256. ctor of length 256.

Goal: use such data to classify spoken pGoal: use such data to classify spoken phoneme.honeme.

The coefficients can be plotted as a funcThe coefficients can be plotted as a function of frequency tion of frequency

Fitting via maximum likelihood only, the coeffiFitting via maximum likelihood only, the coefficient curve is very rough;cient curve is very rough;

Fitting through natural cubic splines:Fitting through natural cubic splines: Rewrite the coefficient function as expansion Rewrite the coefficient function as expansion

of splines of splines that’s that’s where H is a p by M basis matrix of natural where H is a p by M basis matrix of natural

cubic splines.cubic splines. since we replace input features x by filtesince we replace input features x by filte

red version .red version . Fit via linear logistic regression onFit via linear logistic regression on Final result Final result

*x

3. Filtering and Feature Extraction3. Filtering and Feature Extraction

Preprocessing high-dimensional features is Preprocessing high-dimensional features is a power method to improve performance a power method to improve performance of learning algorithm.of learning algorithm.

Previous example Previous example , a filtering , a filtering approach to transform features;approach to transform features;

They need not be linear, but can be in a They need not be linear, but can be in a general form .general form .

Another example: wavelet transformAnother example: wavelet transform

refers to section 5.9.refers to section 5.9.

4.Smoothing Splines4.Smoothing Splines Purpose: avoid complexity of knot Purpose: avoid complexity of knot

selection problem by using maximal set of selection problem by using maximal set of knots.knots.

Complexity is controlled via regularization.Complexity is controlled via regularization. Considering this problem: among all Considering this problem: among all

functions with two continuous second functions with two continuous second derivative, minimizederivative, minimize

Though RSS is defined on an infinite-dimThough RSS is defined on an infinite-dimensional function space, it has an expliciensional function space, it has an explicit, finite-dimensional unique minimizer : t, finite-dimensional unique minimizer : a natural cubic spline with knots at the ua natural cubic spline with knots at the unique values of the .nique values of the .

Penalty term translates to a penalty on tPenalty term translates to a penalty on the spline coefficients. he spline coefficients.

, 1,...,ix i N

Rewrite the solution: , whereRewrite the solution: , whereare N-dimensional set of basis functions repreare N-dimensional set of basis functions representing the family of natural splines.senting the family of natural splines.

Matrix format criterion:Matrix format criterion:

WhereWhere . . With ridge regression result, the solution:With ridge regression result, the solution:

The fitted smooth spline is given byThe fitted smooth spline is given by

Example of a smoothing splineExample of a smoothing spline

Degree of freedom and smoother matrixDegree of freedom and smoother matrix

A smoothing spline with prechosen is a A smoothing spline with prechosen is a linear operator.linear operator.

Let be the N-vector of fitted valuesLet be the N-vector of fitted valuesat the training predictors :at the training predictors :

Here is called smoother matrix. It depenHere is called smoother matrix. It depends on only. ds on only.

f̂ ˆ ( )if x

ix

S

, ix

Suppose is a N by M matrix of M cubic sSuppose is a N by M matrix of M cubic spline basis functions evaluated at the N tpline basis functions evaluated at the N training points , with knot sequence . Thraining points , with knot sequence . The fitted spline value is given by:e fitted spline value is given by:

Here linear operator is a projection opeHere linear operator is a projection operator, known as hat matrix in statistics. rator, known as hat matrix in statistics.

B

ix

H

Similarity and difference between and Similarity and difference between and

Both are symmetric, positive, semi-definite.Both are symmetric, positive, semi-definite. IdempotentIdempotent Rank( )=N, Rank( )=M.Rank( )=N, Rank( )=M. Trace of gives the dimension of the Trace of gives the dimension of the

projection space (number of basis projection space (number of basis functions).functions).

H

S H

S

H

Define effective degree of freedom as:Define effective degree of freedom as:

By specifying , we can derive . By specifying , we can derive . Since is symmetric, hence rewrite Since is symmetric, hence rewrite

is the solution of is the solution of

K is known as Penalty Matrix.K is known as Penalty Matrix.

df S

Eigen-decomposition ofEigen-decomposition of is given by: is given by:

where where

are eigen value and eigen vector of K.are eigen value and eigen vector of K.

S

1( )

1kkd

,k kd u

Highlights of eigen-decompostionHighlights of eigen-decompostion

The eigen-vectors are not effected by changThe eigen-vectors are not effected by changes in .es in .

Shrinking nature .Shrinking nature . The eigen-vector sequence ordered by decrThe eigen-vector sequence ordered by decr

easing appears to increase in complexiteasing appears to increase in complexity.y.

First two eigen values are always 1, since d1First two eigen values are always 1, since d1=d2=0, showing Linear functions are not pen=d2=0, showing Linear functions are not penalized. alized.

( )k

Figure: cubic smooth spline fitting to some dataFigure: cubic smooth spline fitting to some data

5. Automatic selection of the 5. Automatic selection of the smoothing parameterssmoothing parameters

Selecting the placement and number of Selecting the placement and number of knots for regression splines can be a coknots for regression splines can be a combinatorially complex task;mbinatorially complex task;

For smoothing splines, only penalty .For smoothing splines, only penalty . Method: fixing the degree of freedom, soMethod: fixing the degree of freedom, so

lve it from .lve it from . Criterion: Bias-Variance tradeoff.Criterion: Bias-Variance tradeoff.

The Bias-Variance TradeoffThe Bias-Variance Tradeoff

Integrated squared prediction error (EPE):Integrated squared prediction error (EPE):

Cross Validation:Cross Validation:

An example:An example:

Figure: EPE,CV and effects for different Figure: EPE,CV and effects for different degree of freedomdegree of freedom

Any questions?Any questions?

basis expansion and regularization presenter: hongliang fei brian quanz brian quanz date: july 03,...

Documents

piecewise linear basis

linear basis expansion

introduction basis

set of basis functions

piecewise constant basis

linear combination

x linear model

knots spline