robot, learning from data

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERINGSEOUL NATIONAL UNIVERSITY

Robot, Learning From Data: Direct Policy Learning in RKHS

& Inverse Reinforcement Learning Methods

Presenter: Sungjoon ChoiCyber-Physical Systems Laboratory (CPSLAB)

Seoul National University

CPSLAB (http://cpslab.snu.ac.kr)

2Learning From Data

https://canvas.northwestern.edu/courses/20122/assignments/syllabus


3Learning From Data


4Learning From Data

Contents

Learning from Demonstration

Direct Policy Learning Reward Learning

Kernel Methods

Reproducing Kernel Hilbert Space

Learning Theory in RKHS

Inverse Reinforcement Learning Methods


5Learning From Data

Learning From Demonstration

Human Expert

http://villains.wikia.com/wiki/Chef_Skinner

http://www.filmspotting.net/forum/index.php?topic=12312.660

Learning from Demonstration

http://blogs.disney.com/oh-my-disney/2014/09/04/learn-to-love-cooking-with-ratatouille/

Execute in Unseen Environments


6Learning From Data

Learning From Demonstration

There are two approaches: direct policy learning and reward learning.

Direct policy learning Reward learning• Try to find a policy function which maps a state

space to an action space.

State space,

Action space,

Policy function

• Cast the learning problem to regression or multi-class classification problem.

• Standard learning theory or approximation theory are often used to analyze the performance of learning.

• Try to find a reward function indicating how ‘good’ each state-action pair is.

Joint State-Action space,

Rewardspace,

• “The reward function, rather than the policy, is the most succint, robust, and transferable definition of the task.” [1]

[1] Ng, Andrew Y., and Stuart J. Russell. "Algorithms for inverse reinforcement learning." ICML. 2000.

• Often refer to as an Inverse Reinforcement Learning (IRL) problem.


7Learning From Data

Direct Policy Learning


8Learning From Data

Direct Policy Learning in Reproducing Kernel Hilbert Space (RKHS)

State space,

Action space,

Policy function

We will see this problem as a nonlinear regression problem. • In particular, we will use kernel-based regression method. • RKHS refers to as a reproducing kernel Hilbert space where our policy function is

included, , in other words, the hypothesis space.

We will also focus on how well this function generalizes, that is, how well it esti-mates the outputs for previously unseen inputs based in learning theory.


9Learning From Data

Definition and Existence Reproducing Kernel Hilbert Space

Existence of the RKHS is shown by Moore-Aronszajn theorem.

Following is the definition of the reproducing kernel Hilbert space. [1]

[1] Rasmussen, Carl Edward. "Gaussian processes for machine learning." (2006).


10Learning From Data

Reproducing Kernel Hilbert Space

Existence of the RKHS is shown by Moore-Aronszajn theorem.

Following is the definition of the reproducing kernel Hilbert space. [1]

[1] Rasmussen, Carl Edward. "Gaussian processes for machine learning." (2006).

What…?



Reproducing Kernel Hilbert Space – Approach 1

The first thing that might come into your mind when you hear about kernel methods would be transferring input data into the feature space.

What Mercer’s theorem say is that every kernel function can be expressed as an inner-product of infinite (finite for degenerate kernels) dimensional eigenvectors.




Suppose we are given a kernel function .

From the Mercer’s theorem, we have infinite number of basis (eigen) function :.

Then, let’s think about a vector space spanned by the eigenfunctions: and

If we define the inner product of this space as , this space satisfies the reproducing prop-erty! In other words, this space spanned with eigenfunctions (features) is a RHKS!!

Reproducing property: Check! By definition




Suppose we are given a kernel function .

From the Mercer’s theorem, we have infinite number of basis (eigen) function :.

Then, let’s think about a vector space spanned by the eigenfunctions: and

If we define the inner product of this space as , this space satisfies the reproducing prop-erty! In other words, this space spanned with eigenfunctions (features) is a RHKS!!

Reproducing property: Check! By definition

What…?




The RKHS has two properties: 1. For every , as a function of belongs to . 2. has a reproducing property.

Suppose our kernel function is

Then, from the Moore-Aronszajn theorem, there exists a RKHS . But, what does this mean? We want to define a space of functions whose element has following form:

for some .




We want to define a space of functions whose element has following form:

for some .

The RKHS has two properties: 1. For every , as a function of belongs to . 2. has a reproducing property.

Then, this space satisfies the reproducing property:

⟨ 𝑓 (⋅) ,𝑘 (⋅ , 𝑥 ) ⟩=⟨∑𝑖 𝛼𝑖𝑘 (𝑧𝑖 , ⋅) ,𝑘 (⋅ ,𝑥 )⟩𝐻=∑𝑖𝛼 𝑖 ⟨𝑘 (𝑧 𝑖 , ⋅) ,𝑘 (⋅ ,𝑥 ) ⟩=∑

𝑖𝛼𝑖𝑘 (𝑧𝑖 ,𝑥 )= 𝑓 (𝑥 )

If we define the inner-product of the Hilbert space as




We want to define a space of functions whose element has following form:

for some .

If we define the inner-product of the Hilbert space as

The space defined with as above is a reproducing kernel Hilbert space.

As must be greater equal to zero, kernel function should be positive semi-definite!

A norm is defined as follows:.



Practical Usage

Empirical risk minimization with Hilbert norm regularization

If we set our hypothesis space as radial basis function networks, ,

then the optimization becomes

If we rewrite in a matrix form,

which is a quadratic programming with respect to dimensional vector .



Practical Usage

𝐾 𝑋𝑍𝛼 𝑌

𝛼𝑇 𝐾 𝑍𝑍 𝛼

If is identical to , then the above equation is identical to Gaussian process regression or kernel ridge regression. In practice, we can add additional constraints such as or which greatly increases the stability issue!

Often referred to as a sparse Gaussian process regression with inducing points.



Learning Theory

Suppose our training data be sampled from .

The expected risk of a function is defined as: .

Then, the expected risk can be decomposed into

where is the regression function.



Learning Theory

𝐼 [ 𝑓 ]= ∫𝑋 ×𝑌

( 𝑦− 𝑓 𝜌 (𝑥 ) )2𝑃 (𝑥 , 𝑦 )𝑑𝑥𝑑𝑦+ ∫𝑋 ×𝑌

( 𝑓 (𝑥 )− 𝑓 𝜌 (𝑥 ) )2𝑃 (𝑥 , 𝑦 ) 𝑑𝑥𝑑𝑦

∫𝑋 ×𝑌

( 𝑦− 𝑓 𝜌 (𝑥 ) )2𝑃 (𝑥 , 𝑦 )𝑑𝑥𝑑𝑦

∫𝑋

( 𝑓 (𝑥 )− 𝑓 𝜌 (𝑥 ) )2 𝑃 (𝑥 )𝑑𝑥

Expected Risk = Intrinsic Error + Estimation Error

Intrinsic Error (approximation error)

Estimation Error

We cannot handle this.

We can handle only part of this.



Learning Theory

The goal of the learning theory is to minimize following functional:.

is often called generalization error.

[1] Niyogi, Partha, and Federico Girosi. "On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions." Neural Computation 8.4 (1996): 819-842.

If we use a radial basis function network, , we can achieve following bound on the generalization error [1].

with probability at least .



Learning Theory

[1] Niyogi, Partha, and Federico Girosi. "On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions." Neural Computation 8.4 (1996): 819-842.

There are two sources of errors: 1. We are trying to approximate an infinite dimensional object, the regression function with a finite number of parameters (approximation error).

Approximation error:

2. We minimize the empirical risk and obtain , rather than minimizing the expected risk (estimation error).

Estimation error:



Reward Learning



Reward Learning

Understanding basic concept of solving Markov decision process (MDP) or reinforce-ment learning (RL) is crucial in a reward learning problem.

Goal of RL is to find a policy function which maximizes an expected sum of rewards:

If we define , then the optimization becomes,

Where is often called Bellman flow constraint.



Inverse Reinforcement Learning

True reward True density State-action traj.

Solving MDP

State-action traj. max𝑅∈𝐻 (𝑅 )

⟨ 𝜇 ,𝑅 ⟩

Solving IRL

Estimated reward

http://users.eecs.northwestern.edu/~argall/learning.html




NR [1]

MMP [2]AN [3]

MaxEnt [4]BIRL [6] RelEnt [7]

StructIRL [8] GPIRL [5]

DeepIRL [9]

[1] Ng, Andrew Y., and Stuart J. Russell. "Algorithms for inverse reinforcement learning." ICML, 2000.

[2] Ratliff, Nathan D., J. Andrew Bagnell, and Martin A. Zinkevich. "Maximum margin planning." ICML, 2006.

[3] Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." ICML, 2004.

[4] Ziebart, Brian D., Andrew Maas, J.Andrew Bagnell, and Anind K. Dey, "Maximum Entropy Inverse Reinforcement Learning."AAAI. 2008.

[5] Levine, Sergey, Zoran Popovic, and Vladlen Koltun. "Nonlinear inverse reinforcement learning with Gaussian processes." NIPS 2011.

[6] Ramachandran, Deepak, and Eyal Amir. "Bayesian inverse reinforcement learning.“, AAAI, 2007

[7] Boularias, Abdeslam, Jens Kober, and Jan R. Peters. "Relative entropy inverse reinforcement learning." AISTATS. 2011.[8] Klein, Edouard, Matthieu Geis, Bilal Piot, and Olivier Pietquin, "Inverse reinforcement learning through structured classification." NIPS. 2012.[9] Wulfmeier, Markus, Peter Ondruska, and Ingmar Posner. "Deep Inverse Reinforcement Learning." arXiv. 2015.




NR [1]

MMP [2]AN [3]

Maximize discrepancy between expert’s and sampled value.Objective

Maximize margin between expert’s demonstration and every state-actions.

Minimize value between expert’s and sampled ones.

StructIRL [8] Cast IRL to multi-class classification problem.

NR [1]

MMP [2]

AN [3]

StructIRL [8]




MaxEnt [4]

BIRL [6]

RelEnt [7]

GPIRL [5]

DeepIRL [9]

Objective

Define likelihood of state-action trajectories and use MLE.

MaxEnt [4]

BIRL [6]

Define posterior of state-action trajectories and use MH sampling.

Define likelihood using SGP and use gradient ascent method.

Minimize relative entropy between expert’s and learner’s distribution.

GPIRL [5]

RelEnt [7]

DeepIRL [9]

Model likelihood with neural networks.



Conclusion

I believe selecting a proper machine learning algorithm is more than selecting a chocolate from a chocolate box.

"Deep Learning is brute force learning. It is not intelligent learning." "Machine learning is not only about machines, but also about humans."

- Vladimir Vapnik @ NIPS15

http://www.forbes.com/forbes/welcome/http://aboutintelligence.blogspot.kr/2009/01/vapniks-picture-explained.html

Robotics deals with humans!



Thank you for your attention!!Any Questions?

[email protected]

robot, learning from data

Engineering