developmental mechanisms for life-long autonomous learning in robots pierre-yves oudeyer...
TRANSCRIPT
Developmental Mechanisms for Life-Long Autonomous Learning
in RobotsPierre-Yves Oudeyer
Project-Team INRIA-ENSTA-ParisTech FLOWERS
http://www.pyoudeyer.comhttp://flowers.inria.fr
Sensorimotor and social learning:
•Autonomous
•Open, « life-long learning »
•Real world, physical and social Experimental validation
Developmental robotics
• Intrinsic Motivation
• Maturation
• Imitation, Social guidance
Fundamental understanding of the mechanisms of development
Application to assistive robotics
“Engineered” robot learning
• Engineer shows, with fixed interaction protocol in the lab:
• Target:
Regression algorithms (e.g. LGP, LWPR, Gaussian Mixture Regression)
ActionState/context
Actionpolicy
• Engineer provides a reward/fitness function:
• Target:
Optimization algorithms (e.g. NAC, non-linear Nelder-Mead, …)
OROR
« Real » world
Developmental approach
Which generic reward function for spontaneous curiosity
driven learning?
Axe 2
?
Behaviour of human (non-engineer)
?
Axe 1
Learning from interactions with non-engineers
Non-engineer human behaviour
?
1. Intuitive multimodal interfaces• Synthesis and recognition of
emotion in speech (IJHCS, 2001, 5 patents)
• Clicker-training (RAS, 2002; 1 patent)
• Physical human-robot interfaces (Humanoids 2011)
• User studies (Humanoids 2009, HRI 2011)
• Adaptation: learning flexible teaching interfaces (Conn. Sci., 2006, ICDL 2011, IROS 2010)
Spontaneous active exploration, artificial curiosity
in the vicinity of
Non-stationary function, difficult to model
Algorithms for empirical evaluation of de/dt with statistical regression
IAC (2004, 2007), R-IAC (2009), SAGG-RIAC (2010)
McSAGG-RIAC (2011), SGIM (2011)
Non !
Intrinsic MotivationBerlyne (1960), Csikszentmihalyi (1996)Dayan and Belleine (2002)
Quelle fonction de récompense générique
?
simple
complexe
complexe
simple
complexe
complexe
Explore zones where:•Uncertainty/errors maximal•Least exploredAssume:•Spatial or temporal stationarity•Everything is learnable within lifetime
Which experiment ?
Developmental approach
Explore zones where empirically learning progress is maximal
Active learning of models
Sensori state
Sensori state
Actionstate
Actionstate
Contextstate
Contextstate
Classic machine learnerM
(e.g. neural net, SVM, Gaussian process)
Classic machine learnerM
(e.g. neural net, SVM, Gaussian process)
Meta machine learnermetaM
Progressive categorization
Local model of learning progress
. . .
Sensori state at t+1
Prediction
Error feedback
Action selection system
Action selection system Intrinsic reward
Local model of learning progress
IAC, IEEE Trans. EC (2007)R-IAC, IEEE Trans. AMD (2009)
The Playground Experiments
(IEEE Trans. EC 2007; Connection Science 2006; AAAI Work. Dev. Learn. 2005)
Experimentations on Open Learning in the Real World
Playground Experiments
•Autonomous learning of novel affordances and and skills, e.g. object manipulation
IEEE TEC, 2007; IROS 2010; IEEE TAMD, 2009; Front. Neurorobotics, 2007; Connect. Sc., 2006; IEEE ICDL 2010, 2011
simple
complex
complex
• Self-organization of developmental trajectories, bootstrapping of communication New hypotheses for understanding infant development
Front. Neuroscience 2007, Infant and Child Dev. 2008, Connect. Science 2006
Active learning of inverse models SAGG-RIAC (RAS, 2012)
(Context, Movement)
Effect
Redundancy of sensorimotor spaces
From the active choice of action, followed by observation of effect …
… to the active choice of effect, followed by the search of a corresponding action policy through goal-directed optimization (e.g. using NAC, POWER, PI^2-CMA, …)
self-defined RL problem
Spontaneous active exploration of a space of fitness functions parameterized by where one iteratively chooses the which maximizes the empirical evaluation of:
Apprentissage de la locomotion omnidirectionnelle
Performance higher than more classical active learning algorithms in real sensorimotor spaces (non-stationary, non homogeneous) (IEEE TAMD 2009; ICDL 2010, 2011; IROS 2010; RAS 2012)
Experimental evaluation of active learning efficiency
Control Space: Task Space:
Maturational constraints
• Progressive growths of DOF number and spatio-temporal resolution
• Adaptive maturational schedule controlled by active learning/learning progress
(Bjorklund, 1997; Turkewitz and Kenny, 1985)
« Life-long » Experimentation
Acroban(Siggraph 2010, IROS 2011, World Expo, South Korea, 2012)
• Experimentation of algorithms for « life-long » learning in the real world
Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap
Ergo-Robots(Exhibition « Mathematics, a beautiful elsewhere »,Fond. Cartier, 2011-2012)
• Experimentation of algorithms for « life-long » learning in the real world
Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap
Ergo-Robots
Mid-term: open-source distribution of the platform to the scientific community
« Life-long » Experimentation
Baranes, A., Oudeyer, P-Y. (2012) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots, Robotics and Autonomous Systems.http://www.pyoudeyer.com/RAS-SAGG-RIAC-2012.pdf
Baranes, A., Oudeyer, P-Y. (2011a) The Interaction of Maturational Constraints and Intrinsic Motivation in Active Motor Development, in Proceedings of IEEE ICDL-Epirob 2011.http://flowers.inria.fr/BaranesOudeyerICDL11.pdf
Lopes, M., Melo, F., Montesano, L. (2009) Active Learning for Reward Estimation in Inverse Reinforcement Learning, European Conference on Machine Learning (ECML/PKDD), Bled, Slovenia, 2009.http://flowers.inria.fr/mlopes/myrefs/09-ecml-airl.pdf
Nguyen, M., Baranes, A., Oudeyer, P-Y. (2011b) Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in Proceedings of IEEE ICDL-Epirob 2011. http://flowers.inria.fr/NguyenBaranesOudeyerICDL11.pdf
Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007) Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2), pp. 265--286.http://www.pyoudeyer.com/ims.pdf
Baranes, A., Oudeyer, P-Y. (2009 )R-IAC: Robust intrinsically motivated exploration and active learning, IEEE Transactions on Autonomous Mental Development, 1(3), pp. 155--169.
Ly, O., Lapeyre, M., Oudeyer, P-Y. (2011) Bio-inspired vertebral column, compliance and semi-passive dynamics in a lightweight robot, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, US.
Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , Manuel Lopes, Tobias Lang, Marc Toussaint and Pierre-Yves Oudeyer. Neural Information Processing Systems (NIPS 2012), Tahoe, USA. http://flowers.inria.fr/mlopes/myrefs/12-nips-zeta.pdf
The Strategic Student Approach for Life-Long Exploration and Learning, Manuel Lopes and Pierre-Yves Oudeyer. In Proceedings of IEEE ICDL-Epirob 2012, http://flowers.inria.fr/mlopes/myrefs/12-ssp.pdf