robot swarms as ensembles of cooperating components - matthias holzl

Robot Swarms as Ensembles ofCooperating Components

Matthias HölzlWith contributions from Martin Wirsing, Annabelle Klarl

AWASSLucca, June 24, 2013

www.ascens-ist.eu

The Task

Robots cleaning an exhibition area

Matthias Hölzl 2

marXbot

Miniature mobile robot developed by EPFLRough-terrain mobilityRobots can dock to other robotsMany sensors

Proximity sensorsGyroscope3D accelerometerRFID readerCameras. . .

ARM-based Linux systemGripper for picking up items

Matthias Hölzl 3

Swarm Robotics

Matthias Hölzl 4

Problems

Noise, sensor resolutionExtracting information from sensor dataUnforeseen situationsUncertainty about the environmentPerforming complex actions when intermediateresults are uncertain. . .

Matthias Hölzl 5

Action Logics

Logics that can represent change over timeProbabilistic behavior can be modeled (but is cumbersome)

Matthias Hölzl 6

Markov Decision Processes

pos = (x,y)

pos = (x,y+1) pos = (x+1,y+1)

pos = (x + 1,y)e / ...

s, n / ...

s / 0.9 / -0.1e,w / 0.025 / -0.1

w / ...s,n / ...

e / ...s, n / ...

w/ ...s,n / ...

n / 0.9 / -0.1e, w / 0.025 / -0.1

s,n,w,e / 0.05 / -0.1

Matthias Hölzl 7

Markov Decision Processes

watchTV

goToClub

DecideActivity

In Club

Watching TV

Dancing Alone

Drinking

Dancing With

Partner

dance p = 0.5

drinkBeer

Oh, oh

Oh, no

p = 0.05

p = 0.95

Matthias Hölzl 8

MDPs: Strategies

watchTV

goToClub

DecideActivity

In Club

Watching TV

Dancing Alone

Drinking

Dancing With

Partner

dance p = 0.5

drinkBeer

Oh, oh

Oh, no

p = 0.05

p = 0.95

State TV CDB CDFDA watchTV goToClub goToClubIC drinkBeer drinkBeer danceDWP flirt flirt flirtUtility 0.1 −0.05(∗) −1.975(∗)

(∗) 0.05+ (−0.1)(∗∗) 0.05+ (0.5 × 0.2) + 0.5 × (0.25+ (0.05 × 5) + (0.95 × −5))

Matthias Hölzl 9

Reinforcement Learning

General idea:Figure out the expectedvalue of each actionin each statePick the action withthe highest expectedvalue (most of thetime)Update the expectationsaccording to the actualrewards

Matthias Hölzl 10

How well does this work?

Rather well for small problemsBut: state explosion

Matthias Hölzl 11

Solutions

DecompositionHierarchyPartial programs

Matthias Hölzl 12

Action languageFirst-order reasoningHierarchical reinforcement learning

Learns completions for partial programsConcurrency

Reflection / meta-object protocols· · ·

Matthias Hölzl 13

Iliad: A POEM Implementation

Common Lisp-based programming languageFull first-order reasoning

Operations on logical theories: U(C)NA, domain closure, . . .Resolution, hyperresolution, DPLL, etc.Conditional answer extractionProcedural attachment, constraint solving

Hierarchical reinforcement learningBased on Concurrent ALispPartial programsThreadwise and temporal state abstractionHierarchically Optimal Yet Local (HOLY) Q-learningHierarchically Optimal Recursively Decomposed (HORD) Q-learning

Matthias Hölzl 14

Planned Contents

Introduction to CALisp/PoemSimple TD-learning: banditsFlat reinforcement learning: navigationHierarchical reinforcement learning:collecting items individuallyThreadwise decomposition for hierarchicalreinforcement learning: learning collaboratively

Matthias Hölzl 15

n-armed Bandits

search / 1.0 / N(0.1, 1.0)

coll-known / 1.0 / N(0.3, 3.0)

Choice between n actionsReward depends probabilisticly onthe action choiceNo long-term consequencesSimplest form of TD-learning

Matthias Hölzl 16

Flat Learning

XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0)XXTT XX XXXX XXXXXXXXXX XXXX Choices: (N E S W)XXXXXX XX XX XX Q-values:XX XX XXXXXXXX XX #((N (Q -1.8))XX XXXXXX XX XX (E (Q -1.8))XX XX XXXXXXXX (S (Q -2.25))XXXXXXXXXX XX XX (W (Q 2.76)))XX XX XXXX XX Recommended choice is WXX XX XX XX XX XXXX XX RR XXXXXXXXXXXXXXXXXXXXXXXXXX

Matthias Hölzl 17

Flat Learning

(defun simple-robot ()(call (nav (target-loc (robot-env)))))

(defun nav (loc)(until (equal (robot-loc) loc)

(with-choice navigate-choice (dir ’(N E S W))(action navigate-move dir))))

Matthias Hölzl 18

Hierarchical Learning

(defun waste-removal ()(loop

(choose choose-waste-removal-action(action sleep ’SLEEP)(call (pickup-waste))(call (drop-waste)))))

(defun pickup-waste ()(call (nav (waste-source)))(action pickup-waste ’PICKUP))

Matthias Hölzl 19

Multi-Robot Learning

XXXXXXXXXXXXXXXXXXXXXXXXXXTT XX RR XXXX XX XXXXXXXX XX XXXX XX RW RR XXXX XXXX RR WW XXXX XXXX RR XXXX XXXX XXXXXXXXXXXXXXXXXXXXXXXXXX

Matthias Hölzl 20

Thank you!

Any Questions?

matthias.hoelzl@ifi.lmu.de

Matthias Hölzl 21

robot swarms as ensembles of cooperating components - matthias holzl

items matthias hlzl

dir matthias hlzl

cumbersome matthias

state explosion matthias

exhibition area matthias

actual rewards matthias

swarm robotics matthias

n e s w xxxxxx

Technology

explosive ordnance disposal (eod) ensembles: … · figure...

drone swarms - dtic

free bees - home | warren to...hive –to put the bees in...

self-organizing swarms of quadcopters

cooperating teacher manual · welcome letter ......

cooperating association partnerships new...

swarms structures

evolutionary robotics evolutionary robotics for swarms

autonomic service-component ensembles jd3.1 ......properties...

black swans & smart swarms

nagarjunasagar area showing maflc dyke swarms

and nuisance midge swarms enrichment: modelling

2015 04-catching swarms

navigate swarms with aepso/caepso

swarms and distributed computing

evolutionary design of swarms (ssci 2014)

swarms: introduction

page 1gmes - ensembles 2008 ensembles. page 2gmes -...

autonomous swarms in active services · autonomous swarms...

cooperating teacher handbook