tangible user interfaces and reinforcement learning (smart toys) an honours thesis presentation...
DESCRIPTION
Toys - Motivation Plush (soft and furry) toys account for around 25% of toy store sales Over 17 million Furby toys were sold between October 1998 and December 1999 –They had primitive learning capabilities –Mostly robot-like in appearance –They were also relatively cheap (unlike Sony’s Aibo ~$2,000+)TRANSCRIPT
Tangible User Interfaces and Tangible User Interfaces and Reinforcement LearningReinforcement Learning
(Smart Toys)(Smart Toys)
An honours thesis presentation by…
Trent Apted <[email protected]>Supervised by A/Prof Bob Kummerfeld
Smart Internet Technology Research Group
Tangible User InterfacesTangible User Interfaces
• Not just a mouse– Although he can advance my slides
• Facilitate a more intimate interaction with the user– Mainly targeted towards children– Huggable, cute and cuddly– Develop a relationship with the user– Play games
Toys - MotivationToys - Motivation
• Plush (soft and furry) toys account for around 25% of toy store sales
• Over 17 million Furby toys were sold between October 1998 and December 1999– They had primitive learning
capabilities– Mostly robot-like in appearance– They were also relatively cheap
(unlike Sony’s Aibo ~$2,000+)
Toys - ChallengesToys - Challenges
• Want to (cheaply) make a Smart Toy, derived from a plush doll
• Don’t want to adversely affect the original function– Namely, being soft, cute and cuddly
• Also want to be able to detect the usual ‘plush toy’ interactions– E.g. squeeze, carry, lie down with
• I am not an engineer…
Reinforcement Reinforcement LearningLearning
• Like training a dog with a ‘clicker’• Need to associate the reward (click) with
behaviour in a nearby temporal window– How to represent the behaviour– How to determine the window
• Apply learning that attempts to maximise all future possible rewards
• Many techniques– Q-learning, TD(Bayesian models, Markov
models, neural networks, actor-critic, hierarchical
Reinforcement Learning - Reinforcement Learning - ChallengesChallenges
• Not all techniques can be applied to this scenario– Infinite: no end to training examples– Interactive: need to wait for the user to determine the reward– Discrete: few training examples– Future use: a (cheap) toy can not hold a lot of state– Sensors are unsophisticated (Boolean)
• Also needs to be fun– Non-determinism– Anticipate possible actions without stimuli
• May not also be possible to punish the model
My Contributions –My Contributions –Hardware / SystemsHardware / Systems
• Design and implementation of the circuitry and sensors
• Integration into a plush toy• A hardware software
interface (via parallel port) and event model
• Many lessons learnt– E.g. limitations of high-level
hardware (PDA)
My Contributions –My Contributions –SoftwareSoftware
• Reinforcement learning in the context of a Smart Toy
• Flexible learning architecture for further research and exploration (in other contexts)
• Evaluation of the reinforcement learning techniques implemented
• Implementation of a number of simple games to motivate learning of the toy (fun?)
Some Results and AnalysisSome Results and Analysis
• Increasing the state space and re-presenting examples does not help interactive learning
• ‘Snapshot’ environments perform poorly and do not benefit from increasing the learner complexity
• Q-Learning combined with Markov models perform well
Future WorkFuture Work
• Improve the abilities of the toy– There’s spare wires - a speaker would be easy to add– Speech recognition would be harder
• Wireless– Remove the tether for more natural interaction Power source and increased expense
• Collaboration– ‘talking’ to other Smart Toys, collaborating in games– Collaborative learning
• Examine more learning models• Psychological / Sociological aspects