control

26
Control Learning Sensing Actuators Environm ent Observation Robot states Learning Trigge State Info Behavior Controller Commands Actuator Motions Raw Sensor Input Conditioned Sensor Input Sensor/action pairs

Upload: benjiro-fujii

Post on 30-Dec-2015

25 views

Category:

Documents


0 download

DESCRIPTION

Environment. Actuator Motions. Raw Sensor Input. Robot states. Actuators. Sensing. State Info. Conditioned Sensor Input. Observation. Learning Trigger. Control. Learning. Commands. Behavior Controller. Sensor/action pairs. A-D on sound card. microphone. - PowerPoint PPT Presentation

TRANSCRIPT

Control Learning

SensingActuators

Environment

Environment

Observation

Robot states

Learning Trigger

State Info

Behavior ControllerCommands

Actuator Motions

Raw SensorInput

ConditionedSensorInput

Sensor/action pairs

microphone

FFT

/* Sum up power in chirp. */Sum = 0For j = start freq, end freq Sum = FFTpower[j] + SumEnd j

Average Power = Sum/chirp freq range

¼ second of data

A-D on sound card

Zero pad

Note: This process occurs for each microphone at a rate of 4 times a second. In order to smooth out the amplitudes associatedwith intermittent chirps, the average power is kept as a running averageover 1 second.

Left Sensor Right Sensor

Are changes in Sensor values

extreme?

ReactiveSub-system

Non-ReactiveSub-system

Sensor/Action pairs

Actuators

Single Commands

Sensor Values

Yes No

Sensor/Action FIFO

Left Sensor Right Sensor

Do sensor valuesindicate reactiveaction needed?

ReactiveSub-system

Non-ReactiveSub-system

Sensor/Action pairs

Actuators

Single Commands

Sensor Values

Yes No

Sensor/Action FIFO

Inhibit

Left Sensor Right Sensor

Are changes in Sensor values

extreme?

ReactiveSub-system

Non-ReactiveSub-system

Sensor/Action pairs

Actuators

Single Commands

Inhibit

Sensor Values

Yes No

Sensor/Action FIFO

Action Sequence

Library

Maxnet

Left microphone value T(0)

Right microphone value T(0)

Left analog

Center analog

Right analog

Left indicator

Center indicator

Right indicator

Action T(0)

Maxnet

Left micropone trend from times

T(0) – T(-n)

Left analog

Center analog

Right analog

Left indicator

Center indicator

Right indicator

Action T(0)

Right microphone trend from times

T(0) – T(-n)

Left microphone trend from times

T(0) – T(-n)Heading Adjustment spread over times

T(0) – T(n)

Right microphone trend from times

T(0) – T(-n)

Heading AdjustmentLeft microphone value T(0)

Right microphone value T(0)

Left microphone values

T(0) – T(-n)

Right microphone values

T(0) – T(-n)

Heading Adjustment spread over times

T(0) – T(n)

Left microphone values

T(0) – T(-n)

Right microphone values

T(0) – T(-n)

Heading Adjustment T(0)

Heading Adjustment T(1)

Heading Adjustment T(2)

Left microphone values

T(0) – T(-n)

Right microphone values

T(0) – T(-n)

Actions

T(-1) – T(-n)

Heading Adjustment for times

T(0) – T(n)

Maxnet

Left microphone values

T(0) – T(-n)

Left analog

Center analog

Right analog

Left indicator

Center indicator

Right indicator

Right microphone values

T(0) – T(-n)

Actions

T(-1) – T(-n) Action T(0)

Maxnet

Left microphone values

T(0) – T(-n) Left indicatorCenter indicator

Right indicator

Right microphone values

T(0) – T(-n)

Actions

T(-1) – T(-n)

Action T(0)

LCR

LCR

LCR

Action T(1)

Action T(2)

Action T(3)

Maxnet

Maxnet

Maxnet

Robot states

Is reactivesystem beingcalled often?

Are goal statesbeing met?

yes

Trigger Reactive andnon-reactive

learning systems

no

Is the core routine running and is there enough memory

to learn from?

Trigger Reactive andnon-reactive

learning systems

yes

Reactive Memory Preparation

Non-Reactive Memory Preparation

Sensor/Action FIFO

Reactive BehaviorGeneration

Non-reactive BehaviorGeneration

Non - reactive MemoryReactive Memory

Output of process is atrained reactive NN

Output of process is atrained non-reactive NN

Intensity Filter

Negative and PositiveExample Set Creation

Correct Action Marking

GA trains feed-forward NNto select the correct action

given the sensor input.

Output of process is atrained NN

Reactive Memory

Kohonen SOFM

Non - reactive Memory

Codebook

Codebook mirror process

GA trains feed-forward NN using codebook and fitness function that maximizes sensor energyand minimizes changes in direction.

Output of process is atrained NN

Mirror ProcessSensor/Action Memory Reactive Memory

Correlation ClusterSet Creation

Sensor/Action Memory pairs

Is recent memory a member of the

Cluster Set?

Recent pastMemory fromT0 to T-n

Add recent memoryto Non-reactive Memory, mirror, and update miss

count

Update hit count

Yes No

Recent memory

Reactive Memory

Sensor FIFO

Non - reactive Memory

Cluster Set

Recent Memory

SOFM Codebook

0)LIRA

1)LIRA

2)LIRA

0)LIRA

1)LIRA

2)LIRA

0)LIRA

1)LIRA

2)LIRA

0)LIRA

1)LIRA

2)LIRA

1)LIR

2)LIRAAction

1)LIRA

2)LIRA

Feed forward NN

SOFM Codebook Entry

New action replaces old action.New situation made in scratch area.

0)LIRA

1)LIRA

2)LIRA

Close match found inSOFM codebook Intensity value

from LIRA(0)used infitness functionof GA

Kohonen

Recent Memory

RBF Codebook

0)LIRA

1)LIRA

2)LIRA

Feed forward NN

RBF Network

RBF entry (j)Target for RBF entry (j) isRBF entry (m)

0) A

1) A

2) A

Input to NN is pastvalues from RBF codebook entries.

0)LIRA

1)LIRA

2)LIRA

GA uses actions from target entry to train NN.