cs 59000 statistical machine learning lecture 13

25
CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008

Upload: carlo

Post on 11-Jan-2016

33 views

Category:

Documents


6 download

DESCRIPTION

CS 59000 Statistical Machine learning Lecture 13. Yuan (Alan) Qi Purdue CS Oct. 8 2008. Outline. Review of kernel trick, kernel ridge regression and kernel Principle Component Analysis Gaussian processes (GPs) From linear regression to GP GP for regression. Kernel Trick. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 59000 Statistical Machine learning Lecture 13

CS 59000 Statistical Machine learningLecture 13

Yuan (Alan) QiPurdue CS

Oct. 8 2008

Page 2: CS 59000 Statistical Machine learning Lecture 13

Outline

• Review of kernel trick, kernel ridge regression and kernel Principle Component Analysis

• Gaussian processes (GPs)• From linear regression to GP • GP for regression

Page 3: CS 59000 Statistical Machine learning Lecture 13

Kernel Trick1. Reformulate an algorithm such that input

vector enters only in the form of inner product .

2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel

function:Examples: Kernel PCA, Kernel Fisher

discriminant, Support Vector Machines

Page 4: CS 59000 Statistical Machine learning Lecture 13

Dual variables:

Dual Representation for Ridge Regression

Page 5: CS 59000 Statistical Machine learning Lecture 13

Kernel Ridge Regression

Using kernel trick:

Minimize over dual variables:

Page 6: CS 59000 Statistical Machine learning Lecture 13

Generate Kernel Matrix

Positive semidefiniteConsider Gaussian kernel:

Page 7: CS 59000 Statistical Machine learning Lecture 13

Principle Component Analysis (PCA)

Assume

We have

is a normalized eigenvector:

Page 8: CS 59000 Statistical Machine learning Lecture 13

Feature Mapping

Eigen-problem in feature space

Page 9: CS 59000 Statistical Machine learning Lecture 13

Dual Variables

Suppose , we have

Page 10: CS 59000 Statistical Machine learning Lecture 13

Eigen-problem in Feature Space (1)

Page 11: CS 59000 Statistical Machine learning Lecture 13

Eigen-problem in Feature Space (2)

Normalization condition:

Projection coefficient:

Page 12: CS 59000 Statistical Machine learning Lecture 13

General Case for Non-zero Mean Case

Kernel Matrix:

Page 13: CS 59000 Statistical Machine learning Lecture 13

Gaussian Processes

How kernels arise naturally in a Bayesian setting?

Instead of assigning a prior on parameters w, we assign a prior on function value y.Infinite space in theory

Finite space in practice (finite number of training set and test set)

Page 14: CS 59000 Statistical Machine learning Lecture 13

Linear Regression Revisited

Let

We have

Page 15: CS 59000 Statistical Machine learning Lecture 13

From Prior on Parameter to Prior on Function

The prior on function value:

Page 16: CS 59000 Statistical Machine learning Lecture 13

Stochastic Process

A stochastic process is specified by giving the joint distribution for any finite set of values in a consistent manner (Loosely speaking, it means that a marginalized joint distribution is the same as the joint distribution that is defined in the subspace.)

Page 17: CS 59000 Statistical Machine learning Lecture 13

Gaussian Processes

The joint distribution of any variables is a multivariable Gaussian distribution.

Without any prior knowledge, we often set mean to be 0. Then the GP is specified by the covariance :

Page 18: CS 59000 Statistical Machine learning Lecture 13

Impact of Kernel FunctionCovariance matrix : kernel function

Application economics & finance

Page 19: CS 59000 Statistical Machine learning Lecture 13

Gaussian Process for Regression

Likelihood:

Prior:

Marginal distribution:

Page 20: CS 59000 Statistical Machine learning Lecture 13

Samples of GP Prior over Functions

Page 21: CS 59000 Statistical Machine learning Lecture 13

Samples of Data Points

Page 22: CS 59000 Statistical Machine learning Lecture 13

Predictive Distribution

is a Gaussian distribution with mean and variance:

Page 23: CS 59000 Statistical Machine learning Lecture 13

Predictive Mean

We see the same form as kernel ridge regression and kernel PCA.

Page 24: CS 59000 Statistical Machine learning Lecture 13

GP Regression

Discussion: the difference between GP regression and Bayesian regression with Gaussian basis functions?

Page 25: CS 59000 Statistical Machine learning Lecture 13

Marginal Distribution of Target Values