learning of mediation strategies for heterogeneous agents cooperation

1

Learning of Mediation Strategies for Heterogeneous Agents Cooperation

R. Charton, A. Boyer and F. Charpillet

Maia Team - LORIA – France

ICTAI'03 – Sacramento, CA, USA – November, 4th 2003

2

Context of our works

Industrial collaboration for the design of adaptive services that are multimedia, interactive, and general public. Focus on Information Retrieval assistance

Constraints :• User : occasional, novice• Information Source : ownership, costs

Goal : To enhance the service quality

3

Physical Environment

Cooperation in heterogeneous multi-agent systems

Agents of different nature: human, software, robots …

NN NNon-Controllable agents

Virtual Environment

How to make these agents cooperate ?

Achieve together applicative goals that satisfy a subset of agents

Controllable agents

C

C

CC

CP

Partially controllable agents

P

P

P

Interaction Links

4

Presentation Overview Learning of Mediation Strategies for Heterogeneous Agents Cooperation

• Typical example of interaction• Mediator and Mediation Strategies• Towards an MDP based Mediation • Our prototype of Mediator• Experiments and results

5

Too many/raw results...

Don't know how to formulate a request

An example of problem : a flight booking system

Customer

Interaction

Information Source

Goal : book a flightfrom Paris to Sacramento

Query

Results

Mediator

6

Role of the mediator agent

The mediator has to perform an useful task :

• Build a query that matches the most the user goal

• Provide relevant results to the user

• Maximize an utility approximation :– User satisfaction to be maximized

– Information Source cost to be minimized

At any time, the mediator can

• Ask a question to the user about the query

• Send the query to the information source

• Propose a limited number of results to the user

In return, it perceives the other agent's answers :

values, results, selections, rejections …

7

Now, the question is how to

• produce the mediator's behavior ?

• optimize the service quality ?

This requires finding an optimal mediation strategy

Mediation and Mediation strategies

A mediation strategy specifies which action the mediator must select to

control the mediation, according to the current state of the interactions.

A mediation is a sequence of question & answer interactions between the agents directed by the mediator. It is successful if the user gets the relevant results or if the mediator discovers that the source can't give any result.

8

Progressive Query buildingQuery

Precision

Number of InteractionsTotally Unknown

Partially specified

Fully specified

Sufficiently specified

Got useful answers

unuseful interactions

a good compromise

9

Requirements for the Mediator

Management of uncertainty and of imperfect knowledge :

• agents : – users may misunderstand the questions

– users may have a partial knowledge of their needs

• environment : – noise during communication

– imperfect sensors (for instance : speech recognition)

This requires an adaptive behavior

We propose to model the mediation problem with an MDP and to

compute a stochastic behavior for the mediator.

10

Markov Decision Process (MDP)

• Take a decision according to a Policy : S A [0;1]

s2

s0 s1

– States S={s0,s1,s2}

a0

a1

a1

a0

a0

a1– Actions A={a0,a1}

0.3

0.7

0.2

0.8

0.4

0.6

0.9

0.1

0.5 0.5

0.5

0.5

– Transition T : S A S [0;1] with T(s,a,s') = P (s'|s,a)

– Reward R : S A S IR

• Stochastic model <S,A,T,R>

• Optimize the expected reward

0i irR

: Discount factorCompute a Mediation Strategy

leads to Compute a Stochastic Policy

11

Modeling of the flight booking example

Define the model:

• S : State Space

• A : Mediator's actions

• T : Transitions

• R : Rewards

12

States : How to describe goals and objects ?

Form filling approach (Goddeau et al. 1996) :

Queries and source objects are described within a referential. The referential is built on a set of attributes :

Ref = { At 1, … , At m }

Example of referential :

• Departure : { London, Geneva, Paris, Berlin, … }

• Arrival: { Sacramento, Beijing, Moscow, … }

• Class : { Business, Normal, Economic, ... }

13

= U R

State space Mediator

User Source

R is the power set of all the objects of the information source :

R = { flight 1; ... ; flight r} is the set of objects that match the current query

R

U

U is the set of partial queries :

U = {(ea1 , val1), …, (ea m , val m) }

The state of the attribute At i is a couple (eai , vali) :

• Open ea = ‘?’ val is free

• Closed ea = ‘F’ val cannot be specified

• Assigned ea = ‘A’ val is already instantiated

14

State abstraction

The size of the state space is (2 n +1) (2+i) m where– n : total count of objects of the information source– m : number of attributes– i : average number of values per attribute

The size of the abstract state S space : 4 3m

Number of responses

0 nrmax

qr = 0 qr = + qr = * qr = ?

Unknown

(not yet)

An idea : use a State Abstraction S for the MDP

and only keep : the binding state {?, A, F} of each attribute from U

the response quality qr {?, 0, +, *} from R

15

Actions of the Mediator

Mediator

User SourceQuery

+ send the current query to the information source

+ ask the user to select a response

& propositions

Ask the user a question about an attributeExample for the travel class

•“In which class do you want to travel ?”

•“Do you want to travel in business class ?”

•“Are you sure you want to travel in economic class ?”

Questions

16

Rewards

Rewards can be obtained… Mediator

User Source

Selection, refusal...

• from the user interaction part

+ R selection user selects a proposition

- R noselect user refuses all the propositions

- R timeout too long interaction (user disconnection)

Results

• from the information source interaction part

+ R noresp no results for a fully specified task

- R overnum too many results (response quality is '*')

17

Example of mediation with the flight booking service

State Abstraction s Mediator Action Answers Rewards

<?, ?, ? | ?> <?, ?, ? | ?> Ask user for departure Paris 0

<Paris, ?, ? | ?> <A, ?, ? | ?> Ask source for results 1700 flights - R Overnum

<Paris, ?, ? | {nr Max first flights} > <A, ?, ? | *> Ask user for destination Sacramento 0

<Paris, Sacramento, ? | ?> <A, A, ? | ?> Ask user for flight class I don't know 0

<Paris, Sacramento, F | ?> <A, A, F | ?> Ask source for results 4 flights 0

<Paris, Sacramento, F | {4 flights}> <A, A, F | +> Ask user for selection Selection #2 + R Selection

MediatorUser SourceColors used :

18

Compute the Mediation Strategy

Problem : Two parts of model the are unknown !• T = f (user, information source)• R = f (user, information source)

Learn the Mediation Strategy by reinforcement

19

Reinforcement Learning

Dynamic System

Observation

Reinforcement (Reward)

Action

Transition

20

• Reinforcement Learning method

• Can be used online

Q-Learning (Watkins 89)

Q-Learner

Environment

s,r a

s

Q(s,a0)a0

a1

s'0

s'n

V(s)

V(s'0)

V(s'n)

Q(s,a1)Q-Value Q : S A IR

tA'att1t 'a,'sQMaxa,sRa,sQ1a,sQ

: Learning rate

Updating (Bellman 57)

21

Mediator Architecture

Decision Module (Q-Learning)

Interaction Manager

Task Manager (real state)

User / Client Agent

Information Agent Source

Requests and results

Selected Actions

Requests

Mediator Agent

User Profile

Store & retrieve preferences

Rewards

Updates

Results

Abstract State

Answers and selections

22

Experimentation

on the flight-booking application We trained the mediator task with • 3 Attributes (cities of departure/arrival and flight class)• 4 Attributes (+ the time of day for taking off)• 5 Attributes (+ the airline)

# of Attributes

(m)

# of Abstract states

(4.3 m)

# of Actions

(3.m+2)

# of Q-Values

((12.m+8).3 m)

3 108 11 1,188

4 324 14 4,536

5 972 17 16,524

Complexity growth as function of the number of attributes.

23

Learning results for 3-5 attributes (% of hits)

• 3 and 4 attributes 99% of selection (close to optimal)

• 5 attributes 90% of selection (more time required to converge)

0

10

20

30

40

50

60

70

80

90

100

0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000

Number of iterations

% o

f s

ucc

es

sfu

l me

dia

tio

ns

(s

ele

cti

on

/ n

o r

es

po

ns

e)

3 attributes

4 attributes

5 attributes

24

Learning results for 3-5 attributes (avg. length)

• 3 and 4 attributes the minimal length of the mediation is reached

• 5 attributes longer mediations

0

5

10

15

20

25

30

35

40

45

50

0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000

Number of iterations

Avg

. m

edia

tio

n l

eng

th (

nu

m.

of

acti

on

s p

er m

edia

tio

n) 3 attributes

4 attributes

5 attributes

25

Conclusion

Advantages

• MDP+RL allows to learn mediation strategies

• Answers to the needs of a majority of users (profiles)

• Designer Oriented User Oriented

• Incremental Approach

• Implemented Solution

Limits• User is partially observable, especially through imperfect

sensors, like speech recognition • Degradation of performance for more complex tasks

26

Future works

• Use other probabilistic models and methods:– Learn on pre-established policy

– Learn the model (Sutton DynaQ, Classifiers)

– POMDP approach (Modified Q-learning, Baxter Gradient)

• For more generic / complex tasks– Abstraction & Scalability : Change the abstract state space for a

better guidance of the process in the real state

– Hierarchical decomposition (H-MPD & H-POMDP) with attribute dependencies management

(ex : City Possible Company Specific options)

27

Thank you for your attention

Any questions ?

28

References(Allen et al. 2000) Allen J., Byron D., Dzikovska M., Ferguson G, Galescu L.,

Stent A., An Architecture for a Generic Dialogue Shell. In Natural Language Engineering, Cambridge University Press, vol 6, 2000.

(Young 1999) Young S., Probabilistic Methods in Spoken Dialog Systems. In Royal Society, London, September 1999.

(Levin et al. 1998) Levin E, Pieraccini R. and Eckert W. Using Markov Decision Process for Learning Dialogue Strategies. In Proceedings of ICASSP'98, Seattle, USA, 1998.

(Goddeau et al. 1996) Goddeau D., Meng H., Polifroni J., Seneff S., Busayapongchaiy S., A Form-Based Dialogue Manager For Spoken Language Applications, In Proceedings of ICSLP'96, Philadelphia, 1996.

(Sutton & Barto 1998) R. S. and Barto A. G. Reinforcement Learning: An Introduction. MIT Press Cambridge MA, 1998.

(Watkins 1989) Watkins C., Learning from Delayed Rewards. PhD Thesis of the King's College, University of Cambridge, England, 1989.

(Shardanand & Maes 1995) Shardanand U. and Maes P., Social Information Filtering: Algorithms for Automating "Word of Mouth", In Proceedings of ACM CHI'95, Vol. 1, pp. 210-217, 1995.

29

A trace in the Abstract State Space

<?, ? | ?> <?, ? | 0>

<?, ? | +> <?, ? | *>

<A, ? | ?> <A, ?, 0>

<A, ? | +> <A, ? | *>

<?, F | ?> <?, F, 0>

<?, F | +> <?, F | *>

<F, ? | ?> <F, ?, 0>

<F, ? | +> <F, ? | *>

<?, A | ?> <?, A, 0>

<?, A | +> <?, A | *>

<F, A | ?> <F, A, 0>

<F, A | +> <F, A | *>

<A, A | ?> <A, A, 0>

<A, A | +> <A, A | *>

<A, F | ?> <A, F, 0>

<A, F | +> <A, F | *>

<F, F | ?> <F, F, 0>

<F, F | +> <F, F | *>

1-Mediator : Ask the user for Attribute 1

1-User : don't know

2-M : Send query to the source

2-Source : 25 answers ...

4- Mediator : Send query to the source

4- Source : 3 answers ...

3- Mediator : Ask the user for Attribute 2

3- User : correct value

0 : State initialization

learning of mediation strategies for heterogeneous agents cooperation

Documents