using openrdk to learn walk parameters for the humanoid robot nao a. cherubini l. iocchi it’s me...

UsingUsing OpenRDKOpenRDKtoto learn walk parameters learn walk parameters forfor

thethe Humanoid Robot NAOHumanoid Robot NAO A. Cherubini

L. Iocchi

it’s me F. Giannone

M. Lombardo

G. Oriolo

Overview:Overview: environmentenvironment

Robotic Agent NAO

Application Robotic Soccer

SDK

Simulator

• Humanoid Robot

• Produced by Aldebaran

Process raw data from environment

Elaborate raw data to obtain more reliable information

Decide the best behaviour to accomplish the agent goal

Actuate robot motors accordindly

Vision Module Modelling Module

Motion Control Module

Behaviour Control Module

Environment

At First !!!

At First !!!

Overview:Overview: (sub)tasks(sub)tasks

Make Nao walk…how?Make Nao walk…how?

Main Advantage

…and a DrawbackBased on an unknow Walk Model

Ready to Use (…to be tuned)

Nao is equipped with a set of motion utilities including a walk implementation walk implementation that can be

No flexibility at all!!!

called through an interface(NaoQi Motion Proxy)

partially customized by tuningsome parameters

For these reasonswe decided to develop

our walk model and to tune it using

machine learnig tecniques

SPQR Walking library development workflowSPQR Walking library development workflow

Develop the Walk model using Matlab

Test the walk model on Webots simulator

Design and Implement a C++ library for our RDK Soccer Agent

on Webots simulator

on real NAO robot

Finally tune walk parameters (on webots simulator and on NAO)

SPQR Walk Model

Test our Walking RDK Agent

SPQR WalkingLibrary

A simple walking RAgent for NaoA simple walking RAgent for Nao

Motion Control Module

NaoQi Adaptor

Simple Behaviour ModuleSwitches between two states: walk -

stand

Smemy

SPQR Walking Library

NAO (NaoQi)

Webots Client

TCP channel

WEBOTS

uses

Choose a set of variable output:3D coordinates of selected points

of the robot

Choose and parametrize the desiredtrajectories for these variables

at each phase of the gait

SPQR Walking Engine ModelSPQR Walking Engine Model

21 degrees of freedom

Velocity Commands (v,ω)• v is linear velocity• ω is angolar velocity

We follow the “Static Walking Pattern”:

Use a-priori definition of the desired trajectories defined by:

NAO model characteristics

No actuated trunk

No dynamic model available

SPQR velocity commandsSPQR velocity commands

Initial Half Step RectilinearWalkSwingStand Position

Final Half Step

Curvilinear WalkSwing

Turn Step

Behavior ControlModule

Motion ControlModule

Joints Matrix

(v,ω)

(0,ω)

(0, ω)

(0,0)

(v,0)

(v,ω)

(v,0)

(0,0)

(v,ω)

SPQR walking subtasks and parametersSPQR walking subtasks and parameters

SPQR walk subtasks

Foot trajectories inthe xz plane

Center of masstrajectory in lateral

direction

Hip yaw/pitchcontrol (turn)

Arm control

Xtot, Xsw0, Xds

Zst, Zsw

Yft, Yss, Yds, Kr

Hyp

Ks

Biped walking

Double support phaseSwing phase SS%

Walk tuning: main issues Walk tuning: main issues Possible choices

By hand By using machine learning techniques

Machine Learning seems the best solution Less human interaction Explores the search space in a more systematic way

…but take care of some aspects You need to define an effective fitness function You need to choose the right algorithm to explore the parameter

space Only a limited amount of experiments can be done on a real

robot

SPQR Learning System ArchitectureSPQR Learning System Architecture

LearnerLearning library

RAgent

Walking library

uses

uses

Real Nao

Webots

Datato evaluatethe fitness

Fitness Iterationexperiments

(GPS)

SPQR LearnerSPQR Learner

Firstiteration?

Return initialIteration and

iteration information

Apply the chosenalgorithm (strategy)

Yes

No

Policy Gradient(e.g., PGPR)

Nelder MeadSimplex Method

Genetic Algorithm

Learner

Return nextIteration and

iteration information

Policy Gradient (PG) iterationPolicy Gradient (PG) iteration

Given a point p inthe parameter space IRK

Generate n (n=mk) policiesfrom p (for each component

of p: pi , pi+, or pi-)

Evaluate the policiesFor each k {1, …, K},

compute Fk+, Fk0, Fk-

For each k {1, …, K},if F0 > F+ and F0 > F-

then k=0else k= F+ -F-

*= normalized()

p’=p+*

Enhancing PG: PGPREnhancing PG: PGPR

At each iteration i, the gradient estimate (i) can be used to obtain a metric for measuring the relevance of the parameters.

Given the relevance and a threshold T, PGPR prunes less relevant parametersin next iterations.

forgetting factor

Curvilinear biped walking experimentCurvilinear biped walking experiment The robot move along a curve with radius R for a time t

Fitness function:

In which:

radial error

path length

Simulators in learning tasksSimulators in learning tasks

Advantages You can test the gait model and the learning algorithm

without being biased by noise

Limits The results of the experiments on the simulator can

be ported on the real robot, but specialized solutions for the simulated model can be not so effective on the real robot (e.g., it does not take into account asymmetries, models are not very accurate)

Results (1)Results (1)

Five sessions of PG, 20 iterations each, all starting from the same initial configuration

SS%, Ks, Yft have been set to hand-tuned values 16 policies for each iteration

• Fitness increasesin a regular way• Low varianceamong the fivesimulations

Results (2)Results (2)

Zsw Xs KrXsw0

Five runs of PGPR

Final parameter setsfor the five PG runs

A. Cherubini, F. Giannone, L. Iocchi, M. Lombardo, G. Oriolo. “Policy Gradient Learning for a Humanoid Soccer Robot”. Accepted for Journal of Robotics and Autonomous Systems.

A. Cherubini, F. Giannone, L. Iocchi, and P. F. Palamara, “An extended policy gradient algorithm for robot task learning”, Proc. of IEEE/RSJ International Conference on Intelligent Robots and System, 2007.

A. Cherubini, F. Giannone, and L. Iocchi, “Layered learning for a soccer legged robot helped with a 3D simulator”, Proc. of 11th International Robocup Symposium, 2007.

http://openrdk.sourceforge.net

http://www.aldebaran-robotics.com/

http://spqr.dis.uniroma1.it

Bibliography

??? Any Questions ???

???

???

using openrdk to learn walk parameters for the humanoid robot nao a. cherubini l. iocchi it’s me...

Documents

nao spqr

spqr walking subtasks

walk implementation

walk tuning

walk model ready

nao model characteristics

aldebaran slide

oriolo slide