reinforcement learning of context models for ubiquitous computing sofia z aidenberg laboratoire...

67
Reinforcement Learning of Context Models for Ubiquitous Computing Sofia ZAIDENBERG Laboratoire d’Informatique de Grenoble PRIMA Group 17/11/2010 1 PULSAR Research Meeting Under the supervision of Patrick REIGNIER and James L. CROWLEY

Upload: hannah-montgomery

Post on 28-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

1

Reinforcement Learningof Context Models

for Ubiquitous Computing

Sofia ZAIDENBERGLaboratoire d’Informatique de Grenoble

PRIMA Group

17/11/2010PULSAR Research Meeting

Under the supervision ofPatrick REIGNIER and James L. CROWLEY

Page 2: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Ambient Computing

17/11/2010

2 PULSAR Research Meeting

Ubiquitous Computing (ubicomp)

[Weiser, 1991][Weiser, 1994][Weiser and Brown, 1996]

Page 3: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

3 PULSAR Research Meeting 17/11/2010

Page 4: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

4 PULSAR Research Meeting 17/11/2010

Page 5: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Ambient Computing « autistic » devices

Independent Heterogeneous Unaware

Ubiquitous systems Accompany without imposing In periphery of the attention Invisible Calm computing

5 PULSAR Research Meeting 17/11/2010

Page 6: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

15 years later… Ubiquitous computing is already here [Bell and

Dourish, 2007] It’s not exactly like we expected

“Singapore, the intelligent island” “U-Korea”

Not seamless but messy Not invisible but flashy Characterized by improvisation and appropriation

Engaging user experiences [Rogers, 2006]

6 PULSAR Research Meeting 17/11/2010

Page 7: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

New directions Study habits and ways of living of people and

create technologies based on that (and not the opposite)[Barton and Pierce, 2006; Pascoe et al., 2007; Taylor et al., 2007; Jose, 2008]

Redefine smart technologies [Rogers, 2006]

Our goal: Provide an AmI application assisting the user

in his everyday activities

7 PULSAR Research Meeting 17/11/2010

Page 8: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Our goal Context-aware computing +

Personalization Situation + user action

8 PULSAR Research Meeting

Alice

Bob

1. Perception

2. Decision

17/11/2010

Page 9: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Personalization by

Learning

Proposed solution

9 PULSAR Research Meeting 17/11/2010

Page 10: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

10 PULSAR Research Meeting 17/11/2010

Page 11: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Proposed system A virtual assistant embodying the

ubiquitous system The assistant

Perceives the context using its sensors Executes actions using its actuators Receives user feedback for training Adapts its behavior to this feedback (learning)

11 PULSAR Research Meeting 17/11/2010

Page 12: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Constraints Simple training Fast learning Initial behavior consistency Life long learning User trust

Transparence [Bellotti and Edwards, 2001] Intelligibility

System behavior understood by humans Accountability

System able to explain itself

12 PULSAR Research Meeting

The system is adapting to environment and preferences changes

17/11/2010

Page 13: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Example

13 PULSAR Research Meeting

J109 J120

hyperionReminder

!

17/11/2010

Page 14: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

14 PULSAR Research Meeting 17/11/2010

Page 15: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

User study Why a general public user study?

Original Weiser’s vision of calm computing revealed itself to be unsuited to current user needs

Objective Evaluate the expectations and needs vis-à-vis

“ambient computing” and its usages

15 PULSAR Research Meeting 17/11/2010

Page 16: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Terms of the study 26 interviewed subjects

Non-experts Distributed as follows:

16 PULSAR Research Meeting 17/11/2010

Sex

womenmentotal

Age group

18-2526-4040-6060+

Page 17: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Terms of the study 1 hour interviews with open discussion and

support of an interactive model

Questions about advantages and drawbacks of a ubiquitous assistant

User ideas about interesting and useful usages

17 PULSAR Research Meeting 17/11/2010

Page 18: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Results 44 % of subjects interested, 13 % conquered

Profiles of interested subjects: Very busy people Experiencing cognitive overload

Leaning considered as a plus More reliable systemGradual training vs. heavy configurationSimple and pleasant training (“one click”)

18 PULSAR Research Meeting 17/11/2010

Page 19: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

ResultsShort learning phaseExplanations are essential

Interactions Depending on the subject Optional debriefing phase

Mistakes accepted as long as the consequences are not critical

Use control Reveals subconscious customs Worry of dependence

19 PULSAR Research Meeting 17/11/2010

Page 20: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

20

Conclusions Constraints

Not a black box [Bellotti and Edwards, 2001] Intelligibility and

accountability Simple, not intrusive training Short training period, fast re-adaptation to

preference changes Coherent initial behavior

Build a ubiquitous assistant based on these constraints

PULSAR Research Meeting 17/11/2010

Page 21: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

21 PULSAR Research Meeting 17/11/2010

Page 22: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Ubiquitous system

22

J109 J120

hyperion

Reminder!

Uses the existing devices

Heterogeneous

Scattered

System needs

OMiSCID

OSGi

Multiplatform system

Distributed system

Communication protocol

Dynamic service discovery

Easily deployable

[Emonet et al., 2006]

PULSAR Research Meeting 17/11/2010

Page 23: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Modules interconnection

23 PULSAR Research Meeting

KEYBOARDACTIVITY

Sensors Actuators

EMAILS

LOCALIZATION

APPLICATIONS

TEXT TOSPEECH

REMOTECONTROL

PRESENCE

APPLICATIONS

EMAILS

17/11/2010

Page 24: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Example of message exchange

24

J109 J120

hyperion

protee

Text2Speech on protee?

Yes!

OMiSCID

No…

Install and startText2Speech

Bundlerepository

PULSAR Research Meeting 17/11/2010

Page 25: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Database Contains

Static knowledge History of events and actions

To provide explanations

Centralized Queried Fed Simplifies queries

25 PULSAR Research Meeting

by all modules on all devices

17/11/2010

Page 26: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context

model Reinforcement learning Applying reinforcement learning

Experimentations and results Conclusion

26 PULSAR Research Meeting 17/11/2010

Page 27: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Reminder: our constraints Simple training Fast learning Initial behavior consistency Life long learning Explanations

27 PULSAR Research Meeting

Supervised[Brdiczka et al., 2007]

17/11/2010

Page 28: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Reinforcement learning (RL)

28 PULSAR Research Meeting

Markov property The state at

time t depends only on the state at time t-1

17/11/2010

Page 29: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Standard algorithm Q-Learning [Watkins, 1989]

Updates Q-values on a new experience{state, action, next state, reward} Slow because evolution only when something

happens Needs a lot of examples to learn a behavior

29 PULSAR Research Meeting 17/11/2010

Page 30: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

DYNA architecture

31 PULSAR Research Meeting

Agent

World

World model

ActionRewardState

DYNASwitch

[Sutton, 1991]

17/11/2010

Page 31: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Policy

DYNA architecture

32 PULSAR Research Meeting

Environment

World model

Use

Update

Update

Update

Policy

Realinteractions

17/11/2010

Page 32: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Global system

33 PULSAR Research Meeting

Environment

Database

State

Action

Reward?

Example

Action

World model

Realinteractions

Update

Update

Policy

Use

Perception

Example

Reward

Policy

17/11/2010

Page 33: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Modeling of the problem Components:

States Actions

34 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

Components: Transition model Reward model

17/11/2010

Page 34: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

State space States defined by predicates

Understandable by humans (explanations)

Examples : newEmail ( from = Marc, to = Bob ) isInOffice ( John )

State-action: entrance( ) Pause music

35 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

Predicates

System predicat

es

Environment predicates

Karl<+>

17/11/2010

Page 35: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

State space State split

newEmail( from= directeur, to= <+> )Notify

newEmail(from = newsletter, to= <+> )Do not notify

36 PULSAR Research Meeting 17/11/2010

Page 36: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Action space Possible actions combine

Forwarding a reminder to the user Notify of a new email Lock a computer screen Unlock a computer screen Pause the music playing on a computer Un-pause the music playing on a computer Do nothing

38 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

17/11/2010

Page 37: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Reward Explicit reward

Through a non-intrusive user interface

Problems with user rewards

Implicit reward Gathered from clues

(numerical value of lower amplitude)

Smoothing of the model

39 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

17/11/2010

Page 38: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

World model Built using supervised learning

From real examples

Initialized using common sense Functional system from the beginning Initial model vs. initial Q-values [Kaelbling, 2004] Extensibility

40 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Politique

Rewardmodel

Transitionmodel

Rewardmodel

17/11/2010

Page 39: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Transition model

41 PULSAR Research Meeting

s1 s2Starting states

Action or event

Modifications

Rewardmodel

Transitionmodel

+Probability

17/11/2010

Page 40: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Supervised learning ofthe transition model Database of examples{previous state, action, next state}

42 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

s s’

t2t1

t3

s’tn+117/11/2010

Page 41: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Global system

43 PULSAR Research Meeting

Environment

Database

State

Action

Reward?

Example

Action

World model

Realinteractions

Update

Update

Policy

Use

Perception

Example

Reward

Policy

17/11/2010

Page 42: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Episode Episode steps have 2 stages:

Select an event that modifies the state Select an action to react to that event

44 PULSAR Research Meeting

World model

Realinteractions

Update

Policy

Use Update

17/11/2010

Page 43: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Episode

45 PULSAR Research Meeting

RL agent

Environment

World model

Database

Learned from real interactions

or

Q-Learning updates

World model

Realinteractions

Update

Policy

Use Update

Policy

Experience

Policy

17/11/2010

Page 44: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

46 PULSAR Research Meeting 17/11/2010

Page 45: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Experimentations General public survey qualitative evaluation

Quantitative evaluations in 2 steps: Evaluation of the initial phase Evaluation of the system during normal

functioning

47 PULSAR Research Meeting 17/11/2010

Page 46: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Evaluation 1« about initial learning »

48 PULSAR Research Meeting 17/11/2010

Page 47: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Evaluation 1« about initial learning »

49 PULSAR Research Meeting

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

0

10

20

30

40

50

60

Initial episodes with events and initial statesrandomly chosen from the database

102550100

Episodes

Grad

e

Number ofiterationsper episode

17/11/2010

Page 48: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Evaluation 2« interactions and learning »

50 PULSAR Research Meeting

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103

109

100

105

110

115

120

125

Episodes

Gra

de

17/11/2010

Page 49: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Evaluation 2« interactions and learning »

51 PULSAR Research Meeting

60 67 74 81 88 95 102 109 116 123 130 137 144 151 158 165 172 179 18688

89

90

91

92

93

94

95

96

97

Episodes

Gra

de

s

17/11/2010

Page 50: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

52 PULSAR Research Meeting 17/11/2010

Page 51: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Contributions Personalization of a ubiquitous system

Without explicit specification Easy to evolve

Adaptation of indirect reinforcement learningto a real-world problem Construction of a world model Injection of initial knowledge

Deployment of a prototype

53 PULSAR Research Meeting 17/11/2010

Page 52: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Perspectives

Non-interactive analyze of data

User interactions Debriefing

54 PULSAR Research Meeting 17/11/2010

Page 53: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Conclusion

The assistant is a means of creatingan ambient intelligence application

The user is the one making it smart

55 PULSAR Research Meeting 17/11/2010

Page 54: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Thanks for your attention

Questions?

56 PULSAR Research Meeting 17/11/2010

Page 55: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Bibliography[Bellotti and Edwards, 2001]

Victoria BELLOTTI and Keith EDWARDS. « Intelligibility and accountability: human considerations in context-aware systems ». In Human-Computer Interaction, 2001.

[Brdiczka et al., 2007]

Oliver BRDICZKA, James L. CROWLEY and Patrick REIGNIER. « Learning Situation Models for Providing Context-Aware Services ». In Proceedings of HCI International, 2007.

[Buffet, 2003] Olivier Buffet. « Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs ». Thèse de doctorat, Université Henri Poincaré, 2003.

[Emonet et al., 2006]

Rémi Emonet, Dominique Vaufreydaz, Patrick Reignier and Julien Letessier. « O3MiSCID: an Object Oriented Opensource Middleware for Service Connection, Introspection and Discovery ». In IEEE International Workshop on Services Integration in Pervasive Environments, 2006.

[Kaelbling, 2004] Leslie Pack Kaelbling. « Life-Sized Learning ». Lecture at CSE Colloquia, 2004.

[Maes, 1994] Pattie MAES. « Agents that reduce work and information overload ». In Commun. ACM, 1994.

[Maisonnasse 2007]

Jerome MAISONNASSE, Nicolas GOURIER, Patrick REIGNIER and James L. CROWLEY. « Machine awareness of attention for non-disruptive services ». In HCI International, 2007.

[Moore, 1975] Gordon E. MOORE. « Progress in digital integrated electronics ». In Proc. IEEE International Electron Devices Meeting,1975.57 PULSAR Research Meeting 17/11/201

0

Page 56: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Bibliography[Nonogaki and Ueda, 1991]

Hajime Nonogaki and Hirotada Ueda. « FRIEND21project: a construction of 21st century human interface ». In CHI '91: Proceedings of the SIGCHI conference on Human factors in computing systems, 1991.

[Roman et al., 2002]

Manuel ROMAN, Christopher K. HESS, Renato CERQUEIRA,Anand RANGANATHAN, Roy H. CAMPBELL and Klara NAHRSTEDT. « Gaia: A Middleware Infrastructure to Enable Active Spaces ». In IEEE Pervasive Computing, 2002.

[Sutton, 1991] Richard S. Sutton. « Dyna, an integrated architecture for learning, planning, and reacting ». In SIGART Bull, 1991.

[Weiser, 1991] Mark WEISER. « The computer for the 21st century ». In Scientic American, 1991.

[Weiser, 1994] Mark WEISER. « Some computer science issues in ubiquitous computing ». In Commun. ACM, 1993.

[Weiser et Brown, 1996]

Mark WEISER and John Seely BROWN. « The coming age of calm technology ». http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm, 1996.

[Watkins, 1989] CJCH Watkins. « Learning from Delayed Rewards ». Thèse de doctorat, University of Cambridge, 1989.

58 PULSAR Research Meeting 17/11/2010

Page 57: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Interconnexion des modules

65 PULSAR Research Meeting

ACTIVITÉ

CLAVIER

Capteurs Actionneurs

EMAILS

LOCALISATION

PRÉSENCE

APPLICATIONS

APPLICATIONS

EMAILSSYNTHÈSE

VOCALE

CONTRÔLE

DISTANT

17/11/2010

Page 58: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Service OMiSCID

66 PULSAR Research Meeting 17/11/2010

Page 59: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Définition d’un état

67 PULSAR Research Meeting

Prédicat Arguments

alarm title, hour, minute

xActivity machine, isActive

inOffice user, office

absent user

hasUnreadMail from, to, subject, body

entrance isAlone, friendlyName, btAddress

exit isAlone, friendlyName, btAddress

task taskName

user login

userOffice office, login

userMachine machine, login

computerState machine, isScreenLocked, isMusicPaused17/11/2010

Page 60: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Modèle de l’environnement

68 PULSAR Research Meeting

ModèleÉtat E[état suivant]

E[renforcement]

ActionÉvénement

ou

17/11/2010

Page 61: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Réduction de l’espace d’états Accélération de l’apprentissage

Factorisation d’états

Division d’états

69 PULSAR Research Meeting

État Action Q-valeur

…entrance(isAlone=true, friendlyName=<+>, btAddress=<+>)…

pauseMusic 125.3

État Action Q-valeur

…hasUnreadMail(from=boss, to=<+>,subject=<+>, body=<+>)…

inform 144.02

…hasUnreadMail(from=newsletter, to=<+>, subject=<+>, body=<+>)…

notInform 105

Jokers<*> et <+>

17/11/2010

Page 62: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Le simulateur de l’environnement

70 PULSAR Research Meeting 17/11/2010

Page 63: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Critère d’évaluation : la note Résultat de l’AR : une Q-table Comment savoir si elle est « bonne » ? Apprentissage réussi si

Comportement correspond aux souhaits de l’utilisateur

Et c’est mieux si on a beaucoup exploré et si on a une estimation du comportement dans beaucoup d’états

71 PULSAR Research Meeting 17/11/2010

Page 64: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

« Le tableau de bord »

72 PULSAR Research Meeting

Permet d’envoyer par un clic les mêmes événements que les capteurs

17/11/2010

Page 65: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Modèle de récompense Ensemble d’entrées spécifiant

Des contraintes sur certains arguments de l’état Une action La récompense

73 PULSAR Research Meeting 17/11/2010

Page 66: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Modèle de récompense

74 PULSAR Research Meeting

s1

-50

États de départ

Action

Récompense

Modèle de transition

Modèle de récompense

17/11/2010

Page 67: Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR

Apprentissage supervisédu modèle de récompense La base de données contient des exemples{état précédent, action, récompense}

75 PULSAR Research Meeting

Modèle du monde

Interactionsréelles

Utilisation

Mise-à-jour

Mise-à-jour

Politique

s a rsss

a r

s a r

e1

en+117/11/2010