embodied learning of qualitative models

18
Embodied Learning of Qualitative Models Jure Žabkar Exploration and Curiosity in Robot Learning and Inference, DAGSTUHL, March 2011 oint work with xpero partners

Upload: abia

Post on 24-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Embodied Learning of Qualitative Models. Jure Žabkar. joint work with xpero partners. Exploration and Curiosity in Robot Learning and Inference , DAGSTUHL, March 2011. problem. “ How should a robot choose its actions and experiences so as to maximize the effectiveness of its learning ?”. - PowerPoint PPT Presentation

TRANSCRIPT

Embodied Learning of Qualitative Models

Jure Žabkar

Exploration and Curiosity in Robot Learning and Inference, DAGSTUHL, March 2011

joint work with xpero partners

problem

“How should a robot choose its actions and experiences so as to maximize the

effectiveness of its learning?”

goals

• to learn comprehensible models

• no extrinsic reward

• intrinsic reward: improved prediction model about the environment

our way

• learning from scratch(no explicit background knowledge, but given a learning algorithm)

• real robots, real-time learning

learning loop1. observe the environment (collect data)2. learn a model3. use the model to predict the effect of

each action4. choose the best action (w.r.t. active

learning strategy)5. observe the environment and check

whether the predictions match new observations

starting scenarioQ: how does the area of the ball (as observed by the robot)change w.r.t. robot's actions?

area := #pixels of the red blob in the image from robot's camera

actions: sL, sR(the distance of the L/R wheel)

area = area(sL,sR)task: find the appropriate model

equation discovery?we tried several algorithms, no success

motivationpeople most oftenreason qualitatively

AI: robots should mimic

human intelligence

why learning qualitative relations?

the area problem, qualitativelyif action=forward then the area increases until it becomes constant (blob occupies the whole image)

if orientation<0 and action=left (increasing the absolute value of the angle) then the area decreases until it becomes constant (zero)

...

qualitative rules

prediction model gets much more accurate,but the predictions are

not that precise.

methods• active learning + planning• learning methods:

PadéŽabkar, Možina, Bratko, Demšar Learning Qualitative Models from Numerical Data, AIJ, 2011

STRUDELKošmerlj, Bratko, Žabkar Embodied Concept Discovery through Qualitative Action Models, IJUFKS, 2011

QubeŽabkar et al Preference Learning from Qualitative Partial Derivatives, ECML Preference Learning Workshop, 2010

Hyper (with predicate invention mechanism)Leban, Žabkar, Bratko An experiment in robot discovery with ILP Proc. ILP 2008

• tested on simulated (billiards) and real data (medical application, robotics)

ceteris paribus• e.g. partial differentiation• observe a qualitative relation

between two selected features, other features held constant

• qualitative relations of 3 types:– x increases f(x) increases (Padé)– preference relation: x y f(x) f(y) – structural: on(A,B,t1), on(A,C,t2)

"all other things being equal"

qualitative modelsdata

qualitative

changes

qualitative models

Padé, Qube, STRUDEL

machine learning,statistics

qualitative modelsdata

qualitative

changes

qualitative models

Padé, Qube, STRUDEL

machine learning,statistics

qualitative modelsdata

qualitative

changes

qualitative models

Padé, Qube, STRUDEL

machine learning,statistics

learning with structured data• ILP with predicate invention too

complex for real-time learning

• we use ILP to learn smaller subtasks – structural qualitative changes

www.ailab.si/xpero

the concept "movable"the discovered condition which distinguishes different effects of actions:p1(Obj):-

at(T1, Obj, Pos1),at(T2, Obj, Pos2),neq_pos(Pos1, Pos2).

move(T, Obj):-p1(Obj),f1(T, Obj).

move(T, Obj):-not p1(Obj),f2(T, Obj).

f1(T1, Obj):-at(T1, Obj, Pos1),at(T2, Obj, Pos2),Pos1 \== Pos2,{T2 = T1+1}.

f2(T, Obj):-not f1(T, Obj).

the discovered effects of actions:

p1 is true if the object was observed at two different positions