basis expansion and regularization presenter: hongliang fei brian quanz brian quanz date: july 03,...
TRANSCRIPT
Basis Expansion and Basis Expansion and RegularizationRegularization
Presenter: Hongliang Fei Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008Date: July 03, 2008
ContentsContents
IntroductionIntroduction Piecewise Polynomials and SplinesPiecewise Polynomials and Splines Filtering and Feature ExtractionFiltering and Feature Extraction Smoothing SplinesSmoothing Splines Automatic Smoothing parameter seAutomatic Smoothing parameter se
lectionlection
1. Introduction1. Introduction
Basis: In Linear Algebra, a basis is a Basis: In Linear Algebra, a basis is a set of vectors satisfying:set of vectors satisfying:
Linear combination of the basis can Linear combination of the basis can represent every vector in a given represent every vector in a given vector space;vector space;
No element of the set can be No element of the set can be represented as a linear combination represented as a linear combination of the others. of the others.
In Function Space, Basis is In Function Space, Basis is degenerated to a set of basis degenerated to a set of basis functions;functions;
Each function in the function space Each function in the function space can be represented as a linear can be represented as a linear combination of the basis functions.combination of the basis functions.
Example: Quadratic Polynomial bases Example: Quadratic Polynomial bases {1,t,t^2}{1,t,t^2}
What is Basis Expansion?What is Basis Expansion?
Given data X and transformation Given data X and transformation
Then we modelThen we model
as a linear basis expansion in X, where as a linear basis expansion in X, where
is a basis function.is a basis function.
( ) : , 1,..., .pmh X m M
( )mh X
Why Basis Expansion?Why Basis Expansion?
In regression problems, f(X) will typically In regression problems, f(X) will typically nonlinear in X;nonlinear in X;
Linear model is convenient and easy to iLinear model is convenient and easy to interpret;nterpret;
When sample size is very small but attribWhen sample size is very small but attribute size is very large, Linear model is all ute size is very large, Linear model is all what we can do to avoid over fitting.what we can do to avoid over fitting.
2. Piecewise Polynomials and Splin2. Piecewise Polynomials and Splineses
Spline:Spline: In Mathematics, a spline is a special function defined In Mathematics, a spline is a special function defined
piecewise by polynomials;piecewise by polynomials; In Computer Science, the term spline more frequently In Computer Science, the term spline more frequently
refers to a piecewise polynomial (parametric) curve. refers to a piecewise polynomial (parametric) curve. Simple construction, ease and accuracy of evaSimple construction, ease and accuracy of eva
luation, capacity to approximate complex shaluation, capacity to approximate complex shapes through curve fitting and interactive curve pes through curve fitting and interactive curve design.design.
Assume four knots spline (two boundary Assume four knots spline (two boundary knots and two interior knots), also X is oknots and two interior knots), also X is one dimensional.ne dimensional.
Piecewise constant basis: Piecewise constant basis:
Piecewise Linear Basis:Piecewise Linear Basis:
Basis functions:Basis functions:
Six functions corresponding to a six-Six functions corresponding to a six-dimensional linear space.dimensional linear space.
An M-order spline with knots An M-order spline with knots has continuous derivatives up to order has continuous derivatives up to order M-2. The general form for truncated-poM-2. The general form for truncated-power basis set would be:wer basis set would be:
, 1,...,j j K
Natural cubic SplineNatural cubic Spline
A natural cubic spline adds additional coA natural cubic spline adds additional constrains: function is linear beyond the bnstrains: function is linear beyond the boundary knots.oundary knots.
A natural cubic spline with K knots is repA natural cubic spline with K knots is represented by K basis functions.resented by K basis functions.
One can start from a basis for cubic spliOne can start from a basis for cubic splines, and derive the reduced basis by imnes, and derive the reduced basis by imposing boundary constraints.posing boundary constraints.
Example of Natural cubic splineExample of Natural cubic spline
Starting from the truncated power Starting from the truncated power series basis, we arrive at:series basis, we arrive at:
Where Where
Data:1000 samples drawn from 695 “aData:1000 samples drawn from 695 “aa”s and 1022 “ao”s, with a feature vea”s and 1022 “ao”s, with a feature vector of length 256. ctor of length 256.
Goal: use such data to classify spoken pGoal: use such data to classify spoken phoneme.honeme.
The coefficients can be plotted as a funcThe coefficients can be plotted as a function of frequency tion of frequency
Fitting via maximum likelihood only, the coeffiFitting via maximum likelihood only, the coefficient curve is very rough;cient curve is very rough;
Fitting through natural cubic splines:Fitting through natural cubic splines: Rewrite the coefficient function as expansion Rewrite the coefficient function as expansion
of splines of splines that’s that’s where H is a p by M basis matrix of natural where H is a p by M basis matrix of natural
cubic splines.cubic splines. since we replace input features x by filtesince we replace input features x by filte
red version .red version . Fit via linear logistic regression onFit via linear logistic regression on Final result Final result
*x
3. Filtering and Feature Extraction3. Filtering and Feature Extraction
Preprocessing high-dimensional features is Preprocessing high-dimensional features is a power method to improve performance a power method to improve performance of learning algorithm.of learning algorithm.
Previous example Previous example , a filtering , a filtering approach to transform features;approach to transform features;
They need not be linear, but can be in a They need not be linear, but can be in a general form .general form .
Another example: wavelet transformAnother example: wavelet transform
refers to section 5.9.refers to section 5.9.
4.Smoothing Splines4.Smoothing Splines Purpose: avoid complexity of knot Purpose: avoid complexity of knot
selection problem by using maximal set of selection problem by using maximal set of knots.knots.
Complexity is controlled via regularization.Complexity is controlled via regularization. Considering this problem: among all Considering this problem: among all
functions with two continuous second functions with two continuous second derivative, minimizederivative, minimize
Though RSS is defined on an infinite-dimThough RSS is defined on an infinite-dimensional function space, it has an expliciensional function space, it has an explicit, finite-dimensional unique minimizer : t, finite-dimensional unique minimizer : a natural cubic spline with knots at the ua natural cubic spline with knots at the unique values of the .nique values of the .
Penalty term translates to a penalty on tPenalty term translates to a penalty on the spline coefficients. he spline coefficients.
, 1,...,ix i N
Rewrite the solution: , whereRewrite the solution: , whereare N-dimensional set of basis functions repreare N-dimensional set of basis functions representing the family of natural splines.senting the family of natural splines.
Matrix format criterion:Matrix format criterion:
WhereWhere . . With ridge regression result, the solution:With ridge regression result, the solution:
The fitted smooth spline is given byThe fitted smooth spline is given by
Degree of freedom and smoother matrixDegree of freedom and smoother matrix
A smoothing spline with prechosen is a A smoothing spline with prechosen is a linear operator.linear operator.
Let be the N-vector of fitted valuesLet be the N-vector of fitted valuesat the training predictors :at the training predictors :
Here is called smoother matrix. It depenHere is called smoother matrix. It depends on only. ds on only.
f̂ ˆ ( )if x
ix
S
, ix
Suppose is a N by M matrix of M cubic sSuppose is a N by M matrix of M cubic spline basis functions evaluated at the N tpline basis functions evaluated at the N training points , with knot sequence . Thraining points , with knot sequence . The fitted spline value is given by:e fitted spline value is given by:
Here linear operator is a projection opeHere linear operator is a projection operator, known as hat matrix in statistics. rator, known as hat matrix in statistics.
B
ix
H
Similarity and difference between and Similarity and difference between and
Both are symmetric, positive, semi-definite.Both are symmetric, positive, semi-definite. IdempotentIdempotent Rank( )=N, Rank( )=M.Rank( )=N, Rank( )=M. Trace of gives the dimension of the Trace of gives the dimension of the
projection space (number of basis projection space (number of basis functions).functions).
H
S H
S
H
Define effective degree of freedom as:Define effective degree of freedom as:
By specifying , we can derive . By specifying , we can derive . Since is symmetric, hence rewrite Since is symmetric, hence rewrite
is the solution of is the solution of
K is known as Penalty Matrix.K is known as Penalty Matrix.
df S
Eigen-decomposition ofEigen-decomposition of is given by: is given by:
where where
are eigen value and eigen vector of K.are eigen value and eigen vector of K.
S
1( )
1kkd
,k kd u
Highlights of eigen-decompostionHighlights of eigen-decompostion
The eigen-vectors are not effected by changThe eigen-vectors are not effected by changes in .es in .
Shrinking nature .Shrinking nature . The eigen-vector sequence ordered by decrThe eigen-vector sequence ordered by decr
easing appears to increase in complexiteasing appears to increase in complexity.y.
First two eigen values are always 1, since d1First two eigen values are always 1, since d1=d2=0, showing Linear functions are not pen=d2=0, showing Linear functions are not penalized. alized.
( )k
5. Automatic selection of the 5. Automatic selection of the smoothing parameterssmoothing parameters
Selecting the placement and number of Selecting the placement and number of knots for regression splines can be a coknots for regression splines can be a combinatorially complex task;mbinatorially complex task;
For smoothing splines, only penalty .For smoothing splines, only penalty . Method: fixing the degree of freedom, soMethod: fixing the degree of freedom, so
lve it from .lve it from . Criterion: Bias-Variance tradeoff.Criterion: Bias-Variance tradeoff.
The Bias-Variance TradeoffThe Bias-Variance Tradeoff
Integrated squared prediction error (EPE):Integrated squared prediction error (EPE):
Cross Validation:Cross Validation:
Figure: EPE,CV and effects for different Figure: EPE,CV and effects for different degree of freedomdegree of freedom