still talking to machines (cognitively speaking) machine intelligence laboratory information...

Still Talking to MachinesStill Talking to Machines

(Cognitively Speaking)(Cognitively Speaking)

Machine Intelligence LaboratoryMachine Intelligence LaboratoryInformation Engineering DivisionInformation Engineering Division

Cambridge University Engineering DepartmentCambridge University Engineering DepartmentCambridge, UKCambridge, UK

Steve YoungSteve Young

2

Interspeech Plenary September 2010 © Steve Young

Outline of TalkOutline of Talk

A brief historical perspective

Cognitive User Interfaces

Statistical Dialogue Modelling

Scaling to the Real World

System Architecture

Some Examples and Results

Conclusions and future work.

3


Why Talk to Machines?Why Talk to Machines?

it should be an easy and efficient way of finding out information and controlling behaviour

sometimes it is the only way

hands-busy eg surgeon, driver, package handler, etc.

no internet and no call-centres e.g. areas of 3rd world

very small devices

one day it might be func.f. Project Natal - Milo

4


VODIS - circa 1985VODIS - circa 1985

Natural language/mixed initiative Train-timetable Inquiry Service150 word DTW connected speech recognition

Frame-based

DialogueManager

Frame-based

DialogueManager

DecTalkSynthesiser

DecTalkSynthesiser

RecognitionGrammars

PDP11/458 x 8086

Processors

128k Mem/2x5Mb Disk

Words

Text

Demo

Logos Speech

Recogniser

Logos Speech

Recogniser

Collaboration between BT, Logica and Cambridge U.

5


Some desirable properties of a Spoken Dialogue SystemSome desirable properties of a Spoken Dialogue System

able to support reasoning and inference

interpret noisy inputs and resolve ambiguities in context

able to plan under uncertainty

clearly defined communicative goals performance quantified as rewards plans optimized to maximize rewards

able to adapt on-line

robust to speaker (accent, vocab, behaviour,..) robust to environment (noise, location, ..)

able to learn from experience

progressively optimize models and plans over time

CognitiveUser

Interface

CognitiveUser

Interface

S. Young (2010). "Cognitive User Interfaces." Signal Processing Magazine 27(3)

6


Essential Ingredients of a Cognitive User Interface (CUI)Essential Ingredients of a Cognitive User Interface (CUI)

• Explicit representation of uncertainty using a probability model over dialogue states e.g. using Bayesian networks

• Inputs regarded as observations used to update the posterior state probabilities via inference

• Responses defined by plans which map internal states to actions

• The system’s design objectives defined by rewards associated with specific state/action pairs

• Plans optimized via reinforcement learning

• Model parameters estimated via supervised learning and/or optimized via reinforcement learning

Partially ObservableMarkov Decision Process

(POMDP)

Partially ObservableMarkov Decision Process

(POMDP)

7


A Framework for Statistical Dialogue ManagementA Framework for Statistical Dialogue Management

DistributionParameters λ Distribution

Parameters λ

ModelDistributionof Dialogue

States st

ModelDistributionof Dialogue

States st

Policyπ(at|bt,θ)

Policyπ(at|bt,θ)

SpeechUnderstanding

SpeechUnderstanding

bt = P(st|ot-1,bt-1; λ )Belief

ot

PolicyParameters θ

PolicyParameters θ

Observation

ResponseGenerationResponseGeneration

Action at

User

RewardFunction rReward

Function r

R = r(bt,at)t

Σ

bt,at

Reward

8


Belief Tracking aka Belief MonitoringBelief Tracking aka Belief Monitoring

Belief is updated following each new user input

However, the state space is huge and the above equation is intractable for practical systems. So we approximate:

Track just the N most likely statesHidden InformationState System (HIS)

Factorise the state space andignore all but major conditionaldependencies

Graphical ModelSystem (GMS aka BUDS)

S. Young (2010). "The Hidden Information State Model" Computer Speech and Language 24(2)

B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)

9


Dialogue StateDialogue State

otype

gtype

utype

htype

Goal

UserAct

History

Observationat time t

UserBehaviour

Recognition/Understanding

Errors

Tourist Information Domain•type = bar,restaurant •food = French, Chinese, none

Memory ofood

gfood

ufood

hfood

Nex

t T

ime

Slic

e t+

1

J. Williams (2007). ”POMDPs for Spoken Dialog Systems." Computer Speech and Language 21(2)

10


(ignoring history nodes for simplicity)

Dialogue Model ParametersDialogue Model Parameters

otype

gtype

utype

ofood

gfood

ufood

otype

gtype

utype

ofood

gfood

ufood

p(u|g) g French Chinese None

French 0.7 0 0

Chinese 0 0.7 0

NoMention 0.3 0.3 1.0

p(o|u) u French Chinese NoMention

French 0.8 0.2 0

Chinese 0.2 0.8 0

NoMention 0 0 1.0

time t time t+1

11


Belief Monitoring (Tracking)Belief Monitoring (Tracking)

B R F C - B R F C -

otype

gtype

utype

ofood

gfood

ufood

otype

gtype

utype

ofood

gfood

ufood

t=1 t=2

inform(food=french) {0.9} confirm(food=french) affirm() {0.9}

12



B R F C - B R F C -

otype

gtype

utype

ofood

gfood

ufood

otype

gtype

utype

ofood

gfood

ufood

t=1 t=2

inform(type=bar, food=french) {0.6}inform(type=restaurant, food=french) {0.3}

confirm(type=restaurant, food=french)

affirm() {0.9}

13



B R F C - B R F C -

otype

gtype

utype

ofood

gfood

ufood

otype

gtype

utype

ofood

gfood

ufood

t=1 t=2

inform(type=bar) {0.4} select(type=bar, type=restaurant)

inform(type=bar) {0.4}

14


Choosing the next action – the PolicyChoosing the next action – the Policy

gtype gfood

B R F C -

inform(type=bar) {0.4}select(type=bar, type=restaurant)

0 0 0 1 0 1 0 0

type food

Quantize

001010

001010

001010

All Possible SummaryActions: inform, select,

confirm, etc

000000

000000

000000

000000

000000

000000

000000

000000

000000

Policy Vector

Sample

a = select

Map

15


Policy OptimizationPolicy Optimization

Policy parameters chosen to maximize expected reward

Natural gradient ascent works well

Gradient is estimated by sampling dialogues and in practiceFisher Information Matrix does not need to be explicitly computed.

FisherInformation

Matrix

This is the Natural Actor Critic Algorithm.

J. Peters and S. Schaal (2008). "Natural Actor-Critic." Neurocomputing 71(7-9)

16


Dialogue Model Parameter OptimizationDialogue Model Parameter Optimization

Approximation of belief distribution via feature vectors prevents policy differentiation wrt Dialogue Model parameters .

This is the Natural Belief Critic Algorithm.

However a trick can be used. Assume that are drawn from a prior which is differentiable wrt . Then optimize reward wrt to and sample to get .

It is also possible to do maximum likelihood model parameter estimation using Expectation Propagation.

F. Jurcicek (2010). "Natural Belief-Critic" Interspeech 2010

B. Thomson (2010). "Parameter learning for POMDP spoken dialogue models. SLT 2010

17


Performance Comparison in Simulated TownInfo DomainPerformance Comparison in Simulated TownInfo Domain

Handcrafted Modeland Handcrafted Policy

Handcrafted Modeland Trained Policy

Trained Modeland Trained Policy

Handcrafted Policy and Trained Model

Reward = 100 for success – 1 for each turn taken

Reward = 100 for success – 1 for each turn taken

18


Scaling up to Real World ProblemsScaling up to Real World Problems

compact representation of dialogue state eg HIS, BUDS

mapping belief states into summary states via quantisation, feature vectors, etc

mapping actions in summary space back into full space

Several of the key ideas have already been covered

But inference itself is also a problem …

19


CamInfo OntologyCamInfo Ontology

ComplexDialogue

State

• Many concepts• Many values

per conceptmultiple

nodes per concept

20


Belief Propagation TimesBelief Propagation Times

Network Branching Factor

Time

LBP withGrouping

StandardLBP

LBP withGrouping &Const Prob of Change

B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)

21


Architecture of the Cambridge Statistical SDSArchitecture of the Cambridge Statistical SDS

DialogueManager

HISor

BUDS

DialogueManager

HISor

BUDS

p(v|y)

dialogacts

a

dialogacts

SpeechRecognition

SpeechRecognition

SemanticDecoder

SemanticDecoder

SpeechSynthesiser

SpeechSynthesiser

MessageGeneratorMessageGenerator

y

speech

p(w|y)

words

speech

p(m|a)

words

p(x|a)

Corpus DataCorpus Data

Run-time mode

22


Architecture of the Cambridge Statistical SDSArchitecture of the Cambridge Statistical SDS

DialogueManager

HISor

BUDS

DialogueManager

HISor

BUDS

p(v|y)

dialogacts

a

dialogacts

UserSimulator

UserSimulator

ErrorModelErrorModel

Corpus DataCorpus Data

Training mode

23


CMU Let’s Go Spoken Dialogue ChallengeCMU Let’s Go Spoken Dialogue Challenge

Organised by the Dialog Research Center, CMUSee http://www.dialrc.org/sdc/

• Telephone-based spoken dialog system to provide bus schedule information for the City of Pittsburgh, PA (USA).

• Based on existing system with real users.

• Two stage evaluation process

1. Control Test with recruited subjects given specific known tasks

2. Live Test with competing implementations switched according to a daily schedule

• Full results to be presented at a special session at SLT

24


All Qualifying Systems

Let’s Go 2010 Control Test ResultsLet’s Go 2010 Control Test Results

Word Error Rate (WER)

Average Success = 64.8%Average WER = 42.4%

System Z89% Success33% WER

System X65% Success42% WER

System Y75% Success34% WER

PredictedSuccessRate

B. Thomson "Bayesian Update of State for the Let's Go Spoken Dialogue Challenge.” SLT 2010.

25


CamInfo DemoCamInfo Demo

26


ConclusionsConclusions

End-end statistical dialogue systems can be built and are competitive

Core is a POMDP-based dialogue manager which provides an explicit representation of uncertainty with the following benefits

o robust to recognition errors

o objective measure of goodness via reward function

o ability to optimize performance against objectives

o reduced development costs – no hand-tuning, no complex design processes, easily ported to new applications

o natural dialogue – say anything, any time

Still much to do

o faster learning, off-policy learning, long term adaptation, dynamic ontologies, multi-modal input/output

Perhaps talking to machines is within reach ….

27


CreditsCredits

EU FP7 Project: Computational Learning in Adaptive Systems for Spoken Conversation

Spoken Dialogue Management using Partially Observable Markov Decision Processes

Past and Present Members of the CUED Dialogue Systems Group

Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre,

Francois Mairesse, Jorge Prombonas, Jost Schatzmann,

Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams,

Hui Ye, Kai Yu

still talking to machines (cognitively speaking) machine intelligence laboratory information...

Documents