lms algorithm in a reproducing kernel hilbert space
DESCRIPTION
LMS Algorithm in a Reproducing Kernel Hilbert Space. Weifeng Liu, P. P. Pokharel, J. C. Principe Computational NeuroEngineering Laboratory, University of Florida Acknowledgment: This work was partially supported by NSF grant ECS-0300340 and ECS-0601271. Outlines. Introduction - PowerPoint PPT PresentationTRANSCRIPT
LMS Algorithm in a Reproducing Kernel Hilbert Space
Weifeng Liu, P. P. Pokharel, J. C. Principe
Computational NeuroEngineering Laboratory,
University of Florida
Acknowledgment: This work was partially supported by NSF grant ECS-0300340 and ECS-0601271.
Outlines
Introduction Least Mean Square algorithm (easy) Reproducing kernel Hilbert space (tricky) The convergence and regularization analysis
(important) Learning from error models (interesting)
Introduction
Puskal (2006) –Kernel LMS
Kivinen, Smola (2004) –Online learning with kernels (more like leaky LMS)
Moody, Platt (1990’s)—Resource allocation networks (growing and pruning)
LMS (1960, Widrow and Hoff)
Given a sequence of examples from U×R:
U: a compact set of RL. The model is assumed:
The cost function:
1 1(( , ),..., ( , ))N Nu y u y
)()( nvuwy no
n
2
1
1( ) ( ( ))
N
n ni
J w y w uN
LMS
The LMS algorithm
The weight after n iteration:
0
1
1
0
( )an n n n
an n n n
w
e y w u
w w e u
1
n an i iiw e u
(1)
(2)
Reproducing kernel Hilbert space
A continuous, symmetric, positive-definite kernel ,a mapping Φ, and an inner product
H is the closure of the span of all Φ(u). Reproducing Kernel trick The induced norm
:U U R
, H
, ( ) ( )Hf u f u
1 2 1 2( ), ( ) ( , )Hu u u u 2|| || ,H Hf f f
RKHS
Kernel trick: – An inner product in the feature space– A similarity measure you needed.
Mercer’s theorem:T
M uuuu )](),...,(),([)( 21
Common kernels
Gaussian kernel
Polynomial kernel
)||||exp(),( 2jiji uuauu
( , ) ( 1)T pi j i ju u u u
Kernel LMS
Transform the input ui to Φ(ui):
Assume Φ(ui) R∈ M
The model is assumed:
The cost function:
1 1(( ( ), ),..., ( ( ), ))N Nu y u y
( ( )) ( )on ny u v n
2
1
1( ) ( ( ( )))
N
n ni
J y uN
Kernel LMS
The KLMS algorithm
The weight after n iteration:
0
1
1
0
( ( ))
( )
an n n n
an n n n
e y u
e u
1( )
n an i ii
e u
(3)
(4)
Kernel LMS
1
1
1
1
1
1
1
( ( ))
( ), ( )
( , ),
( ( )),
( ).
n n
n ai i ni H
n ai i ni
an n n n
n an i ii
u
e u u
e u u
e y u
e u
(5)
Kernel LMS
After the learning, the input-output relation:
1
( ( ))
( , )
N
N ai ii
y u
e u u
(6)
KLMS vs. RBF
KLMS:
RBF:
α satisfy
G is the gram matrix: G(i,j)=ĸ(ui,uj) RBF needs regularization. Does KLMS need regularization?
1( ) ( , )
N ai ii
y e u u
(7)
1( , )
N
i iiy u u
(8)
G y
KLMS vs. LMS
Kernel LMS is nothing but LMS in the feature space--a very high dimensional reproducing kernel Hilbert space (M>N)
Eigen-spread is awful—does it converge?
Example: MG signal predication
Time embedding: 10.
Learn rate: 0.2 500 training data 100 test data point. Gaussian noise noise variance: .04
0 100 200 300 400 5000
0.02
0.04
0.06
0.08
0.1mse linearmse kernel
Example: MG signal predication
MSE Linear LMS
KLMS RBF (λ=0)
RBF (λ=.1)
RBF (λ=1)
RBF (λ=10)
training 0.021 0.0060 0 0.0026 0.0036 0.010
test 0.026 0.0066 0.019 0.0041 0.0050 0.014
Complexity Comparison
RBF KLMS LMS
Computation O(N3) O(N2) O(L)
Memory O(N2+N*L) O(N*L) O(L)
The asymptotic analysis on convergence—small step-size theory
Denote The correlation matrix
is singular. Assume
and
1
1 NT
x i ii
R x xN
Mii Rux )(
1 1... ... 0k k M T
xR P P
The asymptotic analysis on convergence—small step-size theory
Denote
we have1( )
Mon i ii
n P
[ ( )] (1 ) (0)ni i iE n
2 2 2min min[| ( ) | ] (1 ) (| (0) | )2 2
ni i i
i i
J JE n
The weight stays at the initial place in the 0-eigen-value directions
If
we have
0i
[ ( )] (0)i iE n
2 2[| ( ) | ] | (0) |i iE n
The 0-eigen-value directions does not affect the MSE
Denote
2 2min minmin 1 1
( ) (| (0) | )(1 )2 2
M M ni i i ii i
J JJ n J
2( ) [| ( ) | ]iJ n E y x
It does not care about the null space! It only focuses on the data space!
The minimum norm initialization
The initialization gives the minimum norm possible solution.
00
1ˆ
M
n i iiw P
2 2 2 2
1 1 1ˆ ˆ ˆ|| || || || || || || ||
M k M
n i i ii i i kw w w
Minimum norm solution
0 2 4-1
0
1
2
3
4
5
Learning is Ill-posed
Over-learning
Regularization Technique
Learning from finite data is ill-posed. A priori information--Smoothness is needed. The norm of the function, which indicates the
‘slope’ of the linear operator is constrained. In statistical learning theory, the norm is
associated with the confidence of uniform convergence!
Regularized RBF
The cost function:
or equivalently2 2
1
1( ) ( ( ( ))) || ||
N
n ni
J y uN
2
1
2
1( ) ( ( ( )))
subject to || ||
N
n ni
J y uN
C
KLMS as a learning algorithm
The model with The following inequalities hold
The proof…(H∞ robust + triangle inequality + matrix transformation + derivative + …)
( ) ( )on ny x v n ( )n nx u
2 1 2 2|| || || || 2 || ||a oe v
2 2|| || 2 || ||ae y
The solution of regularized RBF is
The reason of ill-posedness is the inversion of the matrix (G+λI)
The numerical analysis
1( , )
N
i iiy u u
1( )G I y
1 1|| ( ) || as 0G I
The numerical analysis
The solution of KLMS is
By the inequality we have
ae Ly
1( , )
N ai ii
y e u u
|| || 2L
Example: MG signal predication
weight KLMS RBF (λ=0)
RBF (λ=.1)
RBF (λ=1)
RBF (λ=10)
norm 0.520 4.8e+3 10.90 1.37 0.231
The conclusion
The LMS algorithm can be readily used in a RKHS to derive nonlinear algorithms.
From the machine learning view, the LMS method is a simple tool to have a regularized solution.
Demo
Demo
LMS learning model
An event happens, and a decision made. If the decision is correct, nothing happens. If an error is incurred, a correction is made
on the original model. If we do things right, everything is fine and
life goes on. If we do something wrong, lessons are drawn
and our abilities are honed.
Would we over-learn?
If the real world is attempted to be modeled mathematically, what dimension is appropriate?
Are we likely to over-learn? Are we using the LMS algorithm? What is good to remember the past? What is bad to be a perfectionist?
"If you shut your door to all errors, truth will be shut out."---Rabindranath Tagore