cognitive user interfaces: an engineering approach machine intelligence laboratory information...

Cognitive User Interfaces:Cognitive User Interfaces:

An Engineering ApproachAn Engineering Approach

Machine Intelligence LaboratoryMachine Intelligence LaboratoryInformation Engineering DivisionInformation Engineering Division

Cambridge University Engineering DepartmentCambridge University Engineering DepartmentCambridge, UKCambridge, UK

Steve YoungSteve Young

2

ICASSP Plenary April 2009 © Steve Young

Outline of TalkOutline of Talk

Introduction: what is a cognitive user interface?

Example: a simple gesture-driven interface.

Human decision-making and planning.

Partially Observable MDPs – an intractable solution?

Scaling up: statistical spoken dialogue systems.

Conclusions and future work.

3


What is a cognitive user interface?What is a cognitive user interface?

Capable of reasoning and inference

Able to optimize communicative goals

Able to adapt to changing environments

Able to learn from experience

An interface which supports intelligent, efficient and robust interaction between a human and a machine.

4


Example: A Simple Gesture-Driven User InterfaceExample: A Simple Gesture-Driven User Interface

SwipeSwipe

ScrollForward

ScrollBackward

DeletePhoto

SwipeSwipe

A photo sorter

5


Interpreting the InputInterpreting the Input

Backwards

Delete

ForwardsBackwards

Delete

Forwards

angle

P(angle)

Decision Boundaries

6


Pattern ClassificationPattern Classification

angle

P(angle)

G=forwards G=delete G=backwards

Conf(G=backwards)

7


Flowchart-based Decision MakingFlowchart-based Decision Making

Confidence?

Gesture?

backwards

Move back

>= Threshold

Do Nothing

< Threshold

8


What is missing?What is missing?

No modeling of uncertainty

No tracking of belief in the user’s required goal

No quantifiable objectives hence sub-optimal decision making

9


Modeling Uncertainty and Inference – Bayes’ RuleModeling Uncertainty and Inference – Bayes’ Rule

Reverend Thomas Bayes (1702-1761)

) data (

) belief ( ) belief|data () data|belief (

P

PPP

new belief

data

st 1

st

ot

at 1

old belief

action

BayesianNetwork b(s)

s

moveback

?

b’(s)

s

Inference viaBayes Rule

10


Optimizing Decisions – Bellman’s EquationOptimizing Decisions – Bellman’s Equation

V *(b) maxa

r(b,a) P( o o

| b,a)V *( b )

Richard E Bellman(1920-1984)

Reward= + + + +…

r(b1,a1)

r(b2,a2 )

r(bT 1aT 1)

r(bT ,aT )

(b1) a1

(b2 ) a2

(bT 1) aT 1

(bT ) aTPolicy

s1 s2 sT-1 sT

a1 a2 aT-1 aTo1 o2 oT-1 oT

b1 b2 bT-1 bT

Reinforcement Learning

11


Optimizing the Photo-SorterOptimizing the Photo-Sorter

SwipeSwipe

ScrollForward

ScrollBackward

DeletePhoto

SwipeSwipe

{ scroll-forward, scroll-backward, delete-photo }User’s Goal(states)

{ go-forward, go-back, do-delete, do-nothing }System Action

+1 +1 +5 0Rewards

-20

All other: -1

Iteratively optimize policy to maximize rewards …

12


Performance on the Photo-Sorting TaskPerformance on the Photo-Sorting Task

10% 20% 30% 40% 50%0%

Reward

Effective Error Rate

Flow-charted Policy

Fixed Policyand Model

Adapted Policyand Model

TrainingPoint

13


Is Human Decision Making Bayesian?Is Human Decision Making Bayesian?

Humans have brains so that they can move.

So how do humans plan movement? ….

14


A Simple Planning TaskA Simple Planning Task

Prior

Observation

Kording and Wolpert (Nature, 427, 2004)

15


Models for Estimating Target LocationModels for Estimating Target Location

0 1 2 Lateral shift (cm)

Pro

babi

lity

Prior

PosteriorObservation

Kording and Wolpert (Nature, 427, 2004)

0

1

-10 1 2

0

1

-10 1 2

0

1

-10 1 2D

evia

tion

from

Tar

get

True lateral shift

Prior ignored Bayesian Min Error Mapping

16


Practice makes perfectPractice makes perfect

17


Bayesian Model Selection in Human VisionBayesian Model Selection in Human Vision

Inventory

Test

“Which is more familiar?”

Train

“Watch these!”

Orban, Fiser, Aslin, Lengyel (Proc Nat. Academy Science, 105, 2008)

Not visible to subjects

18


Partially Observable Markov Decision ProcessesPartially Observable Markov Decision Processes

Belief represented by distributions over states andupdated from observations by Bayesian inference

Objectives defined by the accumulation of rewards

Policy which maps beliefs into actions and whichcan be optimized by reinforcement learning

s

s

o

a

r(b,a)

(b)

b

• Principled approach to handling uncertainty and planning

• Humans appear to use similar mechanisms

• Principled approach to handling uncertainty and planning

• Humans appear to use similar mechanisms

So what is the problem ?

19


The state and action sets are often very large.

Real-time belief update is intractable.

The mapping is extremely complex

Exact policy optimization is intractable.

The state and action sets are often very large.

Real-time belief update is intractable.

The mapping is extremely complex

Exact policy optimization is intractable.

Scaling-upScaling-up

(b) a

Applying the POMDP framework in real world user interfacesis not straightforward:

20


Spoken Dialog Systems (SDS)Spoken Dialog Systems (SDS)

Database

RecognizerSemanticDecoder

DialogControl

SynthesizerMessageGenerator

User

Waveforms WordsDialogActs

Is that near the tower?

confirm(near=tower)

negate(near=castle)No, it is nearthe castle.

21


Architecture of the Hidden Information State SystemArchitecture of the Hidden Information State System

BeliefUpdate

bSpeechUnderstanding

SpeechGeneration

User

o

g

aDialogPolicy

Williams and Young (CSL 2007)Young et al (ICASSP 2007)

Two key ideas:

States are grouped into equivalence classes called partitionsand belief updating is applied to partitions rather than states

Belief space is mapped into a much simpler summary spacefor policy implementation and optimization

SummarySpace

ˆ b

HeuristicMapping

ˆ a

b(s)

sPOMDP

22


The HIS Belief SpaceThe HIS Belief Space

s g, u, h Each state is composed of three factors:

User Goal User Act Dialog History

Young et al (CSL 2009)

find(venue(hotel,area=east))

find(venue(bar,area=east))

find(venue(hotel,area=west))

….

find(venue)

˜ u 1˜ u 2...

˜ u N

UserRequest

UserInformed

SystemInformed

Grounded DeniedQueried

Initial

× ×

User goals are grouped into partitions

HIS Belief Space

Beliefs update is limited to the most likely members of this set.

23


Master <-> Summary State MappingMaster <-> Summary State Mapping

Master space is mapped into a reduced summary space:

find(venue(hotel,area=east,near=Museum))

find(venue(bar,area=east,near=Museum))

find(venue(hotel,area=east)

find(venue(hotel,area=west)

find(venue(hotel)

....etc

b

P(top)

P(Nxt)

T12Same

TPStatus

THStatus

TUserAct

LastSA

b̂

a HeuristicMapping

act type

ˆ a

Policy

GreetBold RequestTentative RequestConfirmOfferInform.... etc

GreetBold RequestTentative RequestConfirmOfferInform.... etc

VQVQ

confirm( )confirm(area=east)

24


Learning with a simulated UserLearning with a simulated User

Learning by interaction with real users is expensive/impractical.A solution is to use a simulated user, trained on real data.

UserSimulator

includesASR error

model

DialogCorpus

o

a

BeliefUpdate

HeuristicMapping

b SummarySpace

ˆ b

DialogPolicy

ˆ a

Q-Learning

Random action)(randomP

Schatzmann et al (Knowledge Eng Review 2006)

25


HIS System DemoHIS System Demo

26


HIS Performance in NoiseHIS Performance in Noise

Success Rate (%)

Error Rate (%)

0 5 10 15 20 25 30 35 40 45

95

90

85

80

75

70

65

60

55

HIS

MDP

Hand-crafted(HDC)

Simulated User

27


Representing beliefsRepresenting beliefs

An alternative is to model beliefs directly using dynamic Bayesian nets …

Beliefs in a spoken dialog system entail a large number of so-calledslot variables. Eg for tourist information:

Cardinality is huge and we cannot handle the full joint distribution.

P(venue, location, pricerange, foodtype, music, …)P(venue, location, pricerange, foodtype, music, …)

In the HIS system, we threshold the joint distribution and just record the highprobability values. The partitions marginalize out all the unknowns.

But this is approximate, and belief update now depends on theassumption that the underlying user goal does not change.

P(venue=bar, location=central, music=jazz) = 0.32P(venue=bar, location=central, music=blues) = 0.27P(venue=bar, location=east, music=jazz) = 0.11etc

P(venue=bar, location=central, music=jazz) = 0.32P(venue=bar, location=central, music=blues) = 0.27P(venue=bar, location=east, music=jazz) = 0.11etc

28


Modeling Belief with Dynamic Bayesian Networks (DBNs)Modeling Belief with Dynamic Bayesian Networks (DBNs)

Decompose state into DBN, retaining only essential conditional dependencies

gtype

gfood

utype ufood

htype

hfood

a

u

o

g’type

g’food

u’type u’food

h’type

h’food

a’

u’

o’

g

u

h

Time t Time t+1

Eg restaurant

Eg chinese

Thomson et al (ICASSP, 2008)

29


Factor Graph for the Full Tourist Information SystemFactor Graph for the Full Tourist Information System

Factor graphs are very large, even with minimal dependency modeling.Henceneed very efficient belief updatingneed to define policies directly on full belief networks

30


Bayesian Update of Dialog State (BUDS) SystemBayesian Update of Dialog State (BUDS) System

Thomson et al (CSL 2009)

Belief update depends on message passing

x1

xM

…. xf

x f

f x

f x (x) f (x) x fxm

x

(xm )

sum over all combinationsof variable values

P(food)

foodZ1 Z2 Z3

Grouping possible valuesinto partitions greatly simplifies these summationsFr It ….

31


Belief Propagation TimesBelief Propagation Times

Network Branching Factor

Time

StandardLBP

LBP withGrouping

LBP withGrouping &Const Prob of Change

32


Policy Optimization in the BUDS SystemPolicy Optimization in the BUDS System

Thomson et al (ICASSP, 2008)

Summary space now depends on forming a simple characterization ofeach individual slot.

Define policy as a parametric function and optimize wrt θusing Natural Actor Criticalgorithm.

'

)(.

)(.

'),|(

a

b

b

a

a

e

eba

])(, ,...)( ,)([)( ,*,,TTTT bbbb alocationafoodaa

Each action dependent basis function is separated out into slot-basedcomponents, eg

01000

1.0 0.0 0.00.8 0.2 0.00.6 0.4 0.00.4 0.4 0.20.3 0.3 0.4

1st 2nd Rest

slot beliefquantization

actionindicatorfunction

33


BUDS Performance in NoiseBUDS Performance in Noise

Error Rate (%)

Simulated User

BUDS

MDP

Average Reward

34


ConclusionsConclusions

Future generations of intelligent systems and agents will needrobust, adaptive, cognitive human-computer interfaces

Bayesian belief tracking and automatic strategy optimizationprovide the mathematical foundations

Human evolution seems to have come to the same conclusion

Early results are promising but research is needed

a) to develop scalable solutions which can handle very largenetworks in real time

b) to incorporate more detailed linguistic capabilities

c) to understand how to integrate different modalities: speech, gesture, emotion, etc

d) to understand how to migrate these approaches into industrialsystems.

35


CreditsCredits

EU FP7 Project: Computational Learning in Adaptive Systems for Spoken Conversation

Spoken Dialogue Management using Partially Observable Markov Decision Processes

Past and Present Members of the CUED Dialogue Systems Group

Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre, Francois Mairesse, Jost Schatzmann, Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams, Hui Ye, Kai Yu

Colleagues in the CUED Information Engineering Division

Bill Byrne, Mark Gales, Zoubin Ghahramani, Mate Lengyel, Daniel Wolpert, Phil Woodland

cognitive user interfaces: an engineering approach machine intelligence laboratory information...

Documents