still talking to machines (cognitively speaking) machine intelligence laboratory information...

27
Still Talking to Machines Still Talking to Machines (Cognitively Speaking) (Cognitively Speaking) Machine Intelligence Laboratory Machine Intelligence Laboratory Information Engineering Division Information Engineering Division Cambridge University Engineering Department Cambridge University Engineering Department Cambridge, UK Cambridge, UK Steve Young Steve Young

Upload: joanna-rosaline-barker

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

Still Talking to MachinesStill Talking to Machines

(Cognitively Speaking)(Cognitively Speaking)

Machine Intelligence LaboratoryMachine Intelligence LaboratoryInformation Engineering DivisionInformation Engineering Division

Cambridge University Engineering DepartmentCambridge University Engineering DepartmentCambridge, UKCambridge, UK

Steve YoungSteve Young

Page 2: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

2

Interspeech Plenary September 2010 © Steve Young

Outline of TalkOutline of Talk

A brief historical perspective

Cognitive User Interfaces

Statistical Dialogue Modelling

Scaling to the Real World

System Architecture

Some Examples and Results

Conclusions and future work.

Page 3: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

3

Interspeech Plenary September 2010 © Steve Young

Why Talk to Machines?Why Talk to Machines?

it should be an easy and efficient way of finding out information and controlling behaviour

sometimes it is the only way

hands-busy eg surgeon, driver, package handler, etc.

no internet and no call-centres e.g. areas of 3rd world

very small devices

one day it might be func.f. Project Natal - Milo

Page 4: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

4

Interspeech Plenary September 2010 © Steve Young

VODIS - circa 1985VODIS - circa 1985

Natural language/mixed initiative Train-timetable Inquiry Service150 word DTW connected speech recognition

Frame-based

DialogueManager

Frame-based

DialogueManager

DecTalkSynthesiser

DecTalkSynthesiser

RecognitionGrammars

PDP11/458 x 8086

Processors

128k Mem/2x5Mb Disk

Words

Text

Demo

Logos Speech

Recogniser

Logos Speech

Recogniser

Collaboration between BT, Logica and Cambridge U.

Page 5: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

5

Interspeech Plenary September 2010 © Steve Young

Some desirable properties of a Spoken Dialogue SystemSome desirable properties of a Spoken Dialogue System

able to support reasoning and inference

interpret noisy inputs and resolve ambiguities in context

able to plan under uncertainty

clearly defined communicative goals performance quantified as rewards plans optimized to maximize rewards

able to adapt on-line

robust to speaker (accent, vocab, behaviour,..) robust to environment (noise, location, ..)

able to learn from experience

progressively optimize models and plans over time

CognitiveUser

Interface

CognitiveUser

Interface

S. Young (2010). "Cognitive User Interfaces." Signal Processing Magazine 27(3)

Page 6: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

6

Interspeech Plenary September 2010 © Steve Young

Essential Ingredients of a Cognitive User Interface (CUI)Essential Ingredients of a Cognitive User Interface (CUI)

• Explicit representation of uncertainty using a probability model over dialogue states e.g. using Bayesian networks

• Inputs regarded as observations used to update the posterior state probabilities via inference

• Responses defined by plans which map internal states to actions

• The system’s design objectives defined by rewards associated with specific state/action pairs

• Plans optimized via reinforcement learning

• Model parameters estimated via supervised learning and/or optimized via reinforcement learning

Partially ObservableMarkov Decision Process

(POMDP)

Partially ObservableMarkov Decision Process

(POMDP)

Page 7: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

7

Interspeech Plenary September 2010 © Steve Young

A Framework for Statistical Dialogue ManagementA Framework for Statistical Dialogue Management

DistributionParameters λ Distribution

Parameters λ

ModelDistributionof Dialogue

States st

ModelDistributionof Dialogue

States st

Policyπ(at|bt,θ)

Policyπ(at|bt,θ)

SpeechUnderstanding

SpeechUnderstanding

bt = P(st|ot-1,bt-1; λ )Belief

ot

PolicyParameters θ

PolicyParameters θ

Observation

ResponseGenerationResponseGeneration

Action at

User

RewardFunction rReward

Function r

R = r(bt,at)t

Σ

bt,at

Reward

Page 8: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

8

Interspeech Plenary September 2010 © Steve Young

Belief Tracking aka Belief MonitoringBelief Tracking aka Belief Monitoring

Belief is updated following each new user input

However, the state space is huge and the above equation is intractable for practical systems. So we approximate:

Track just the N most likely statesHidden InformationState System (HIS)

Factorise the state space andignore all but major conditionaldependencies

Graphical ModelSystem (GMS aka BUDS)

S. Young (2010). "The Hidden Information State Model" Computer Speech and Language 24(2)

B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)

Page 9: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

9

Interspeech Plenary September 2010 © Steve Young

Dialogue StateDialogue State

otype

gtype

utype

htype

Goal

UserAct

History

Observationat time t

UserBehaviour

Recognition/Understanding

Errors

Tourist Information Domain•type = bar,restaurant •food = French, Chinese, none

Memory ofood

gfood

ufood

hfood

Nex

t T

ime

Slic

e t+

1

J. Williams (2007). ”POMDPs for Spoken Dialog Systems." Computer Speech and Language 21(2)

Page 10: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

10

Interspeech Plenary September 2010 © Steve Young

(ignoring history nodes for simplicity)

Dialogue Model ParametersDialogue Model Parameters

otype

gtype

utype

ofood

gfood

ufood

otype

gtype

utype

ofood

gfood

ufood

p(u|g) g French Chinese None

French 0.7 0 0

Chinese 0 0.7 0

NoMention 0.3 0.3 1.0

p(o|u) u French Chinese NoMention

French 0.8 0.2 0

Chinese 0.2 0.8 0

NoMention 0 0 1.0

time t time t+1

Page 11: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

11

Interspeech Plenary September 2010 © Steve Young

Belief Monitoring (Tracking)Belief Monitoring (Tracking)

B R F C - B R F C -

otype

gtype

utype

ofood

gfood

ufood

otype

gtype

utype

ofood

gfood

ufood

t=1 t=2

inform(food=french) {0.9} confirm(food=french) affirm() {0.9}

Page 12: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

12

Interspeech Plenary September 2010 © Steve Young

Belief Monitoring (Tracking)Belief Monitoring (Tracking)

B R F C - B R F C -

otype

gtype

utype

ofood

gfood

ufood

otype

gtype

utype

ofood

gfood

ufood

t=1 t=2

inform(type=bar, food=french) {0.6}inform(type=restaurant, food=french) {0.3}

confirm(type=restaurant, food=french)

affirm() {0.9}

Page 13: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

13

Interspeech Plenary September 2010 © Steve Young

Belief Monitoring (Tracking)Belief Monitoring (Tracking)

B R F C - B R F C -

otype

gtype

utype

ofood

gfood

ufood

otype

gtype

utype

ofood

gfood

ufood

t=1 t=2

inform(type=bar) {0.4} select(type=bar, type=restaurant)

inform(type=bar) {0.4}

Page 14: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

14

Interspeech Plenary September 2010 © Steve Young

Choosing the next action – the PolicyChoosing the next action – the Policy

gtype gfood

B R F C -

inform(type=bar) {0.4}select(type=bar, type=restaurant)

0 0 0 1 0 1 0 0

type food

Quantize

001010

001010

001010

All Possible SummaryActions: inform, select,

confirm, etc

000000

000000

000000

000000

000000

000000

000000

000000

000000

Policy Vector

Sample

a = select

Map

Page 15: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

15

Interspeech Plenary September 2010 © Steve Young

Policy OptimizationPolicy Optimization

Policy parameters chosen to maximize expected reward

Natural gradient ascent works well

Gradient is estimated by sampling dialogues and in practiceFisher Information Matrix does not need to be explicitly computed.

FisherInformation

Matrix

This is the Natural Actor Critic Algorithm.

J. Peters and S. Schaal (2008). "Natural Actor-Critic." Neurocomputing 71(7-9)

Page 16: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

16

Interspeech Plenary September 2010 © Steve Young

Dialogue Model Parameter OptimizationDialogue Model Parameter Optimization

Approximation of belief distribution via feature vectors prevents policy differentiation wrt Dialogue Model parameters .

This is the Natural Belief Critic Algorithm.

However a trick can be used. Assume that are drawn from a prior which is differentiable wrt . Then optimize reward wrt to and sample to get .

It is also possible to do maximum likelihood model parameter estimation using Expectation Propagation.

F. Jurcicek (2010). "Natural Belief-Critic" Interspeech 2010

B. Thomson (2010). "Parameter learning for POMDP spoken dialogue models. SLT 2010

Page 17: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

17

Interspeech Plenary September 2010 © Steve Young

Performance Comparison in Simulated TownInfo DomainPerformance Comparison in Simulated TownInfo Domain

Handcrafted Modeland Handcrafted Policy

Handcrafted Modeland Trained Policy

Trained Modeland Trained Policy

Handcrafted Policy and Trained Model

Reward = 100 for success – 1 for each turn taken

Reward = 100 for success – 1 for each turn taken

Page 18: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

18

Interspeech Plenary September 2010 © Steve Young

Scaling up to Real World ProblemsScaling up to Real World Problems

compact representation of dialogue state eg HIS, BUDS

mapping belief states into summary states via quantisation, feature vectors, etc

mapping actions in summary space back into full space

Several of the key ideas have already been covered

But inference itself is also a problem …

Page 19: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

19

Interspeech Plenary September 2010 © Steve Young

CamInfo OntologyCamInfo Ontology

ComplexDialogue

State

• Many concepts• Many values

per conceptmultiple

nodes per concept

Page 20: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

20

Interspeech Plenary September 2010 © Steve Young

Belief Propagation TimesBelief Propagation Times

Network Branching Factor

Time

LBP withGrouping

StandardLBP

LBP withGrouping &Const Prob of Change

B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)

Page 21: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

21

Interspeech Plenary September 2010 © Steve Young

Architecture of the Cambridge Statistical SDSArchitecture of the Cambridge Statistical SDS

DialogueManager

HISor

BUDS

DialogueManager

HISor

BUDS

p(v|y)

dialogacts

a

dialogacts

SpeechRecognition

SpeechRecognition

SemanticDecoder

SemanticDecoder

SpeechSynthesiser

SpeechSynthesiser

MessageGeneratorMessageGenerator

y

speech

p(w|y)

words

speech

p(m|a)

words

p(x|a)

Corpus DataCorpus Data

Run-time mode

Page 22: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

22

Interspeech Plenary September 2010 © Steve Young

Architecture of the Cambridge Statistical SDSArchitecture of the Cambridge Statistical SDS

DialogueManager

HISor

BUDS

DialogueManager

HISor

BUDS

p(v|y)

dialogacts

a

dialogacts

UserSimulator

UserSimulator

ErrorModelErrorModel

Corpus DataCorpus Data

Training mode

Page 23: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

23

Interspeech Plenary September 2010 © Steve Young

CMU Let’s Go Spoken Dialogue ChallengeCMU Let’s Go Spoken Dialogue Challenge

Organised by the Dialog Research Center, CMUSee http://www.dialrc.org/sdc/

• Telephone-based spoken dialog system to provide bus schedule information for the City of Pittsburgh, PA (USA).

• Based on existing system with real users.

• Two stage evaluation process

1. Control Test with recruited subjects given specific known tasks

2. Live Test with competing implementations switched according to a daily schedule

• Full results to be presented at a special session at SLT

Page 24: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

24

Interspeech Plenary September 2010 © Steve Young

All Qualifying Systems

Let’s Go 2010 Control Test ResultsLet’s Go 2010 Control Test Results

Word Error Rate (WER)

Average Success = 64.8%Average WER = 42.4%

System Z89% Success33% WER

System X65% Success42% WER

System Y75% Success34% WER

PredictedSuccessRate

B. Thomson "Bayesian Update of State for the Let's Go Spoken Dialogue Challenge.” SLT 2010.

Page 25: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

25

Interspeech Plenary September 2010 © Steve Young

CamInfo DemoCamInfo Demo

Page 26: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

26

Interspeech Plenary September 2010 © Steve Young

ConclusionsConclusions

End-end statistical dialogue systems can be built and are competitive

Core is a POMDP-based dialogue manager which provides an explicit representation of uncertainty with the following benefits

o robust to recognition errors

o objective measure of goodness via reward function

o ability to optimize performance against objectives

o reduced development costs – no hand-tuning, no complex design processes, easily ported to new applications

o natural dialogue – say anything, any time

Still much to do

o faster learning, off-policy learning, long term adaptation, dynamic ontologies, multi-modal input/output

Perhaps talking to machines is within reach ….

Page 27: Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department

27

Interspeech Plenary September 2010 © Steve Young

CreditsCredits

EU FP7 Project: Computational Learning in Adaptive Systems for Spoken Conversation

Spoken Dialogue Management using Partially Observable Markov Decision Processes

Past and Present Members of the CUED Dialogue Systems Group

Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre,

Francois Mairesse, Jorge Prombonas, Jost Schatzmann,

Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams,

Hui Ye, Kai Yu