using openrdk to learn walk parameters for the humanoid robot nao a. cherubini l. iocchi it’s me...
Post on 19-Dec-2015
214 views
TRANSCRIPT
UsingUsing OpenRDKOpenRDKtoto learn walk parameters learn walk parameters forfor
thethe Humanoid Robot NAOHumanoid Robot NAO A. Cherubini
L. Iocchi
it’s me F. Giannone
M. Lombardo
G. Oriolo
Overview:Overview: environmentenvironment
Robotic Agent NAO
Application Robotic Soccer
SDK
Simulator
• Humanoid Robot
• Produced by Aldebaran
Process raw data from environment
Elaborate raw data to obtain more reliable information
Decide the best behaviour to accomplish the agent goal
Actuate robot motors accordindly
Vision Module Modelling Module
Motion Control Module
Behaviour Control Module
Environment
At First !!!
At First !!!
Overview:Overview: (sub)tasks(sub)tasks
Make Nao walk…how?Make Nao walk…how?
Main Advantage
…and a DrawbackBased on an unknow Walk Model
Ready to Use (…to be tuned)
Nao is equipped with a set of motion utilities including a walk implementation walk implementation that can be
No flexibility at all!!!
called through an interface(NaoQi Motion Proxy)
partially customized by tuningsome parameters
For these reasonswe decided to develop
our walk model and to tune it using
machine learnig tecniques
SPQR Walking library development workflowSPQR Walking library development workflow
Develop the Walk model using Matlab
Test the walk model on Webots simulator
Design and Implement a C++ library for our RDK Soccer Agent
on Webots simulator
on real NAO robot
Finally tune walk parameters (on webots simulator and on NAO)
SPQR Walk Model
Test our Walking RDK Agent
SPQR WalkingLibrary
A simple walking RAgent for NaoA simple walking RAgent for Nao
Motion Control Module
NaoQi Adaptor
Simple Behaviour ModuleSwitches between two states: walk -
stand
Smemy
SPQR Walking Library
NAO (NaoQi)
Webots Client
TCP channel
WEBOTS
uses
Choose a set of variable output:3D coordinates of selected points
of the robot
Choose and parametrize the desiredtrajectories for these variables
at each phase of the gait
SPQR Walking Engine ModelSPQR Walking Engine Model
21 degrees of freedom
Velocity Commands (v,ω)• v is linear velocity• ω is angolar velocity
We follow the “Static Walking Pattern”:
Use a-priori definition of the desired trajectories defined by:
NAO model characteristics
No actuated trunk
No dynamic model available
SPQR velocity commandsSPQR velocity commands
Initial Half Step RectilinearWalkSwingStand Position
Final Half Step
Curvilinear WalkSwing
Turn Step
Behavior ControlModule
Motion ControlModule
Joints Matrix
(v,ω)
(0,ω)
(0, ω)
(0,0)
(v,0)
(v,ω)
(v,0)
(0,0)
(v,ω)
SPQR walking subtasks and parametersSPQR walking subtasks and parameters
SPQR walk subtasks
Foot trajectories inthe xz plane
Center of masstrajectory in lateral
direction
Hip yaw/pitchcontrol (turn)
Arm control
Xtot, Xsw0, Xds
Zst, Zsw
Yft, Yss, Yds, Kr
Hyp
Ks
Biped walking
Double support phaseSwing phase SS%
Walk tuning: main issues Walk tuning: main issues Possible choices
By hand By using machine learning techniques
Machine Learning seems the best solution Less human interaction Explores the search space in a more systematic way
…but take care of some aspects You need to define an effective fitness function You need to choose the right algorithm to explore the parameter
space Only a limited amount of experiments can be done on a real
robot
SPQR Learning System ArchitectureSPQR Learning System Architecture
LearnerLearning library
RAgent
Walking library
uses
uses
Real Nao
Webots
Datato evaluatethe fitness
Fitness Iterationexperiments
(GPS)
SPQR LearnerSPQR Learner
Firstiteration?
Return initialIteration and
iteration information
Apply the chosenalgorithm (strategy)
Yes
No
Policy Gradient(e.g., PGPR)
Nelder MeadSimplex Method
Genetic Algorithm
Learner
Return nextIteration and
iteration information
Policy Gradient (PG) iterationPolicy Gradient (PG) iteration
Given a point p inthe parameter space IRK
Generate n (n=mk) policiesfrom p (for each component
of p: pi , pi+, or pi-)
Evaluate the policiesFor each k {1, …, K},
compute Fk+, Fk0, Fk-
For each k {1, …, K},if F0 > F+ and F0 > F-
then k=0else k= F+ -F-
*= normalized()
p’=p+*
Enhancing PG: PGPREnhancing PG: PGPR
At each iteration i, the gradient estimate (i) can be used to obtain a metric for measuring the relevance of the parameters.
Given the relevance and a threshold T, PGPR prunes less relevant parametersin next iterations.
forgetting factor
Curvilinear biped walking experimentCurvilinear biped walking experiment The robot move along a curve with radius R for a time t
Fitness function:
In which:
radial error
path length
Simulators in learning tasksSimulators in learning tasks
Advantages You can test the gait model and the learning algorithm
without being biased by noise
Limits The results of the experiments on the simulator can
be ported on the real robot, but specialized solutions for the simulated model can be not so effective on the real robot (e.g., it does not take into account asymmetries, models are not very accurate)
Results (1)Results (1)
Five sessions of PG, 20 iterations each, all starting from the same initial configuration
SS%, Ks, Yft have been set to hand-tuned values 16 policies for each iteration
• Fitness increasesin a regular way• Low varianceamong the fivesimulations
Results (2)Results (2)
Zsw Xs KrXsw0
Five runs of PGPR
Final parameter setsfor the five PG runs
A. Cherubini, F. Giannone, L. Iocchi, M. Lombardo, G. Oriolo. “Policy Gradient Learning for a Humanoid Soccer Robot”. Accepted for Journal of Robotics and Autonomous Systems.
A. Cherubini, F. Giannone, L. Iocchi, and P. F. Palamara, “An extended policy gradient algorithm for robot task learning”, Proc. of IEEE/RSJ International Conference on Intelligent Robots and System, 2007.
A. Cherubini, F. Giannone, and L. Iocchi, “Layered learning for a soccer legged robot helped with a 3D simulator”, Proc. of 11th International Robocup Symposium, 2007.
http://openrdk.sourceforge.net
http://www.aldebaran-robotics.com/
http://spqr.dis.uniroma1.it
Bibliography
??? Any Questions ???
???
???