functional analytic framework for model selection

24
Aug. 27, 2 003 IFAC-SYSID2003 Functional Analytic Framework for Model Selection Masashi Sugiyama Tokyo Institute of Technology, To kyo, Japan Fraunhofer FIRST-IDA, Berlin, Ger many

Upload: marly

Post on 11-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Functional Analytic Framework for Model Selection. Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Fraunhofer FIRST-IDA, Berlin, Germany. Regression Problem. :Learning target function. :Learned function. :Training examples. (noise). From , obtain a good - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Functional Analytic Framework for Model Selection

Aug. 27, 2003IFAC-SYSID2003

Functional Analytic Framework for Model Selection

Functional Analytic Framework for Model Selection

Masashi Sugiyama

Tokyo Institute of Technology, Tokyo, Japan

Fraunhofer FIRST-IDA, Berlin, Germany

Page 2: Functional Analytic Framework for Model Selection

2

From , obtain a goodapproximation to

Regression Problem

Regression Problem

:Learning target function

:Learned function

:Training examples

(noise)

Page 3: Functional Analytic Framework for Model Selection

3Model SelectionModel Selection

Too simple Appropriate Too complex

Target function

Learned function

Choice of the model is extremely importantfor obtaining good learned function !

(Model refers to, e.g., regularization parameter)

Page 4: Functional Analytic Framework for Model Selection

4Aims of Our ResearchAims of Our Research

Model is chosen such that a generalization error estimator is minimized.

Therefore, model selection research is essentially to pursue an accurate estimator of the generalization error.

We are interested inHaving a novel method in different framework.Estimating the generalization error with small

(finite) samples.

Page 5: Functional Analytic Framework for Model Selection

5

: A functional Hilbert spaceWe assumeWe shall measure the “goodness”

of the learned function (or the generalization error) by

Formulating Regression Problemas Function Approximation

Problem

Formulating Regression Problemas Function Approximation

Problem

:Norm in

:Expectation over noise

Page 6: Functional Analytic Framework for Model Selection

6

In learning problems, we sample values of the target function at sample points (e.g., ).

Therefore, values of the target function at sample points should be specified.

This means that usual -space is not suitable for learning problems.

Function Spaces for LearningFunction Spaces for Learning

and have different values at

But they are treated asthe same function in

is spanned by

Page 7: Functional Analytic Framework for Model Selection

7

In a reproducing kernel Hilbert space (RKHS), a value of a function at an input point is always specified.

Indeed, an RKHS has the reproducing kernel with reproducing property:

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces

:Inner product in

Page 8: Functional Analytic Framework for Model Selection

8Sampling OperatorSampling Operator

For any RKHS , there exists a linear operator from to such that

Indeed,

:Neumann-Schatten product

: -th standard basis in

For vectors,

Page 9: Functional Analytic Framework for Model Selection

9Our FrameworkOur Framework

Learningtarget

function

Learnedfunction

+noise

Sampling operator(Always linear)

Learning operator(Generally non-linear)

RKHS Sample value space

Gen. error

:Expectation over noise

Page 10: Functional Analytic Framework for Model Selection

10

We want to estimate . But it includes unknown so it is not straightforward.

To cope with this problem, We shall estimate only its essential part

We focus on the kernel regression model:

Tricks for Estimating

Generalization Error

Tricks for Estimating

Generalization Error

ConstantEssential part

:Reproducing kernel of

Page 11: Functional Analytic Framework for Model Selection

11

Unknown target function can be erased!

For the kernel regression model,the essential gen. error is expressed by

A Key LemmaA Key Lemma

:Generalized inverse

:Expectation over noise

Page 12: Functional Analytic Framework for Model Selection

12

is an unbiased estimator of the essential gen. error .

However, the noise vector is unknown.Let us define

Clearly, it is still unbiased:We would like to handle well.

Estimating Essential Part Estimating Essential Part

Page 13: Functional Analytic Framework for Model Selection

13How to Deal with How to Deal with

Depending on the type of learning operatorwe consider the following three cases.

A) is linear.

B) is non-linear but twice almost differentiable.

C) is general non-linear.

Page 14: Functional Analytic Framework for Model Selection

14A) Examples of

Linear Learning OperatorA) Examples of

Linear Learning OperatorKernel ridge regressionA particular Gaussian process regressionLeast-squares support vector machine

:Parameters to be learned

:Ridge parameter

Page 15: Functional Analytic Framework for Model Selection

15

When the learning operator is linear,

A) Linear LearningA) Linear Learning

This induces the subspace information criterion (SIC):

SIC is unbiased with finite samples:

M. Sugiyama & H. Ogawa (Neural Comp, 2001)M. Sugiyama & K.-R. Müller (JMLR, 2002)

:Adjoint of

Page 16: Functional Analytic Framework for Model Selection

16How to Deal with How to Deal with

Depending on the type of learning operatorwe consider the following three cases.

A) is linear.

B) is non-linear but twice almost differentiable.

C) is general non-linear.

Page 17: Functional Analytic Framework for Model Selection

17B) Examples of Twice Almost

Differentiable Learning OperatorB) Examples of Twice Almost

Differentiable Learning Operator

Support vector regression with Huber’s loss

:Ridge parameter

:Threshold

Page 18: Functional Analytic Framework for Model Selection

18

For the Gaussian noise, we have

B) Twice Differentiable Learning

B) Twice Differentiable Learning

SIC for twice almost differentiable learning:

It reduces to the original SIC if is linear. It is still unbiased with finite samples:

:Vector-valued function

Page 19: Functional Analytic Framework for Model Selection

19How to Deal with How to Deal with

Depending on the type of learning operatorwe consider the following three cases.

A) is linear.

B) is non-linear but twice almost differentiable.

C) is general non-linear.

Page 20: Functional Analytic Framework for Model Selection

20C) Examples of General

Non-Linear Learning OperatorC) Examples of General

Non-Linear Learning OperatorKernel sparse regression

Support vector regression with Vapnik’s loss

Page 21: Functional Analytic Framework for Model Selection

21

Approximation by the bootstrap

C) General Non-Linear Learning C) General Non-Linear Learning

Bootstrap approximation of SIC (BASIC):

BASIC is almost unbiased:

:Expectation over bootstrap replications

Page 22: Functional Analytic Framework for Model Selection

22

:Gaussian RKHS

Kernel ridge regression

Simulation: Learning Sinc function

Simulation: Learning Sinc function

:Ridge parameter

Page 23: Functional Analytic Framework for Model Selection

23Simulation: DELVE Data

SetsSimulation: DELVE Data

Sets

Red: Best or comparable (95%t-test)

Normalized test error

Page 24: Functional Analytic Framework for Model Selection

24ConclusionsConclusions

We provided a functional analytic framework for regression, where the generalization error is measured using the RKHS norm:

Within this framework, we derived a generalization error estimator called SIC.

A) Linear learning (Kernel ridge, GPR, LS-SVM):SIC is exact unbiased with finite samples.

B) Twice almost differentiable learning (SVR+Huber):SIC is exact unbiased with finite samples.

C) Non-linear learning (K-sparse, SVR+Vapnik):BASIC is almost unbiased.