towards intelligent conversational systemsgjf/talks/2019/smp-talk0813.pdf · • dialogue tracker...

71
Towards Intelligent Conversational Systems: Informativeness, Diversity and Controllability Jiafeng Guo (郭嘉丰), Professor Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Homepage: www.bigdatalab.ac.cn/~gjf 1

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Towards Intelligent Conversational Systems:

Informativeness, Diversity and Controllability

Jiafeng Guo (郭嘉丰), Professor

Institute of Computing Technology, Chinese Academy of Sciences

University of Chinese Academy of Sciences

Homepage: www.bigdatalab.ac.cn/~gjf

1

Page 2: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Collaborators

2

Hainan Zhang Ruqing Zhang Yixing Fan Yanyan Lan Xueqi Cheng

Page 3: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

3

The Turing test, developed by Alan Turing in 1950, is a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. Turing proposed that a human evaluator would judge natural language conversations between a human and a machine designed to generate human-like responses.

Page 4: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Eliza 1966 - Earliest Social Bot

4

• Created from 1964 to 1966 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum

• Features• Hand-crafted scripts• Keyword spotting• Template matching• Substitution

Weizenbaum, Joseph (1966). "ELIZA—a computer program for the study of natural language communication between man and machine". Communications of the ACM. 9: 36–45.

Page 5: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Conversation is a Big Trend

6From Yun-Hung Chen

Page 6: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Conversational Systems

7

Task-Oriented Chit-ChatPersonal assistant, achieve a certain task

No specific goal, focus on conversation flow

Xiao Ice

Mitsuku

笨笨

Page 7: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Methodology in Conversational System

Pipeline-based

Four main module: • Natural Language

Understanding• Dialogue

Management• Natural Language

Generation[Goddeau et al.,1996; Lemon and Pietquin,2007; Young et al., 2010; Williams,2012;

Su et al., 2016]

1960s

➢Large human effort➢Accumulative errors➢Difficult to optimize

Page 8: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Methodology in Conversational System

2000s

Pipeline-based

Four main module: • Natural Language

Understanding• Dialogue Tracker• Policy Learning• Natural Language

Generation[Goddeau et al.,1996; Lemon and Pietquin,2007; Young et al., 2010; Williams,2012;

Su et al., 2016]

1960s 2010s

➢Large human effort➢Accumulative errors➢Difficult to optimize

Retrieval-based

Two Step: • Retrieval <utterance,

response> pairs fromdatabase as candidateresponses

• Rerank the candidateresponses with thematching functions

[Leuki et al.,2009; Hu et al., 2014; Lu and Li, 2013; Wu et al., 2016; Lowe et al., 2015;

Zhou et al.,2016b]

➢Simple and effective➢Require large database➢Limited language flexibility

Generative-based

Neural Encoder-Decoder: • Encode the post as a

fixed vector withRNN/LSTM.

• Decode the responseone step by one stepbased the above vector

[Stuskever et al., 2014; Cho et al., 2014; Bahdanau et al., 2015; Shang et al., 2015b; Serban et al., 2016; Serban et

al., 2017a,b]

➢More like human process➢Fully learnable

End-to-End methods

Page 9: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Methodology in Conversational System

11

Neural Encoder-Decoder Model

Grey Box: Statistical Memory

Black Box?

Similar to sequence models in Neural Machine Translation (NMT), summarization, etc.Uses either RNN, LSTM, GRU, etc.

Data-driven: the responses are in general fluent and relevant

Page 10: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Challenge: The blandness problem

12

general meaningless responses

Image From Jianfeng Gao

Page 11: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Challenge: The collapse problem

13

X: Can you recommend me a tourist city?Y1: Yes, Beijing is a beautiful city .Y2: Yes, Beijing is a very beautiful city !Y3: Yes, Beijing is a beautiful city ~ !

X: The Shenzhou spacecraft is about to dock with TiangongY1: Wow, our motherland is really strong.Y2: Wow, our motherland is very strong!Y3: Wow, our motherland is so strong ~ !

responses in similar pattern

Page 12: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Challenge: The consistency problem

14

poor response consistency

Image From Jianfeng Gao

Page 13: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Challenge: The no-grounding problem

15

not knowledge-grounded

Learn the general shape of conversations

Page 14: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Towards Intelligent Conversational SystemsBeyond and

PersonalizedGrounded

Consistency

DiversityInformativeness

FluencyRelevance

Page 15: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Towards Intelligent Conversational Systems

Fluency

Engagement

Personalized

Grounded

Consistency

Diversity

Informativeness

Relevance

Page 16: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Reinforcing Coherence for Sequence toSequence Model in Dialogue Generation(IJCAI 2018)

Informativeness

Page 17: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Many-to-one characteristics of Conversation

Shenzhou 8 spacecraft is going to launch at 5:58:10 pm.

Today I have got my PhD degree finally.

The new TV show time has been determined.

The NBA team is coming to Shanghai next month.

Support! Cheers!

The Course of the Blandness Problem

Glad to know.

Page 18: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

The Course of the Blandness Problem

Traditional Seq2Seq(Sutskever et al. ,2014)

Observation: Seq2Seq models with MLE are likely to generate common response (patterns), lacking specific information

Page 19: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Seq2Seq with MLE is equivalent to optimizing Kullback–Leibler(KL) divergence

ℒ = −න𝑌𝜖𝑃𝑟(𝑌|𝑋)

log𝑃g(𝑌|𝑋) 𝑑𝑌

= −න𝑦

𝑃𝑟 𝑌 𝑋 log𝑃g(𝑌|𝑋)d𝑌

= න𝑦

𝑃𝑟 𝑌 𝑋 log𝑃𝑟 𝑌 𝑋

𝑃g 𝑌 𝑋𝑑𝑌 −න

𝑦

𝑃𝑟 𝑌 𝑋 log𝑃𝑟 𝑌 𝑋 𝑑𝑌

= 𝐾𝐿 𝑃𝑟 𝑌 𝑋 ||𝑃g 𝑌 𝑋 + 𝐻(𝑃𝑟 𝑌 𝑋 )

Analysis: Not penalize the case whose generation probability is high while the true probability is low.

The Course of the Blandness Problem

generation probability true probability

Page 20: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Key Idea

But true data probability is difficult to estimate

Using true data probability to rectify the generation probability

Reward

ActionSeq2Seq

ModelAgen

t

Coherence

ModelEnvir. Reward

Action

Reward

Action

Post→ Response

Seq2Seq

ModelAgen

t

Response→ Post

Seq2Seq

ModelAgen

t

(a) (b)

True

Probability

Page 21: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Key Idea——Statistical analysis

60.19

57.39

60.3

62.06

66.04

56

58

60

62

64

66

1 2 3 4 5

cosi

ne

sim

ilari

ty(%

)of

po

stan

dg

en

era

tio

n

Human score

Human criteria

1. Nonfluent or logically wrong

2. Not related

3. Common response

4. Strongly related

5. Like a real person’s tone

Observation: The coherence between a post and its generated response is consistent with the human evaluation. In other words, the true probability of a response is highly likely to be proportional to the coherence score between the post and the response.

Manual annotation over 300 randomly sampled posts and their corresponding generated responses.

Page 22: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model

Reward

ActionSeq2Seq

ModelAgen

t

Coherence

ModelEnvir. Reward

Action

Reward

Action

Post→ Response

Seq2Seq

ModelAgen

t

Response→ Post

Seq2Seq

ModelAgen

t

(a) (b)Unlearned Similarity Function

𝑟𝑐𝑜𝑠 =< ℎ 𝑋 , ℎ(𝐺) >

ℎ(𝑋) ℎ(𝐺)

Can

city

Post X

You

Alps

Generation G

… …

Mean operator Mean operator

h(X) h(G)

cosine

Page 23: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model

Reward

ActionSeq2Seq

ModelAgen

t

Coherence

ModelEnvir. Reward

Action

Reward

Action

Post→ Response

Seq2Seq

ModelAgen

t

Response→ Post

Seq2Seq

ModelAgen

t

(a) (b)Pre-trained Matching Function

Unlearned Similarity Function

GRU bilinear model MatchPyramid

Page 24: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model

Reward

ActionSeq2Seq

ModelAgen

t

Coherence

ModelEnvir. Reward

Action

Reward

Action

Post→ Response

Seq2Seq

ModelAgen

t

Response→ Post

Seq2Seq

ModelAgen

t

(a) (b)Dual Learning Architecture

Pre-trained Matching Function

Unlearned Similarity Function

𝐺1 = arg 𝑚𝑎𝑥𝑃1(𝐺1|𝑋)𝑟𝑑𝑢𝑎𝑙1 𝑋, 𝐺1 = 𝑙𝑜𝑔𝑃2 𝑋 𝐺1𝐺2 = arg 𝑚𝑎𝑥𝑃2(𝐺2|𝑌)

𝑟𝑑𝑢𝑎𝑙2 𝑌, 𝐺2 = 𝑙𝑜𝑔𝑃1(𝑌|𝐺2)

Page 25: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model: Dual Learning

Motivation: Mutual Reciprocity

Forward: Post->Response

Backward: Response->Post

Common Response:I don’t know!

What do you think of the team today?

Do you think the team is doing well today?

How is the game of the team today?

Response:I think the team had a great game today!

……

Reinforce the coherence of post and generation based on thepost-response mutual reciprocity.

Poor Backward

Page 26: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments

Experiment Dataset

• Chinese Weibo Dataset• Extract post-review pairs from the Chinese weibo website

• STC:

• 3 million training pairs

• 0.39 million developing pairs

• 0.4 million testing pairs

• English Open Subtitle Dataset• OSDb is a large, open-domain dataset containing roughly 60M-70M

scripted lines spoken by movie characters.

• OSDb:• 3 million training pairs

• 0.4 million developing pairs

• 0.4 million testing pairs• https://github.com/jiweil/Neural-Dialogue-Generation

Page 27: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments

• Baseline Methods• Seq2Seq model [Sutskever et al., 2014]

• RNN-encdec model [Cho et al., 2014]

• Seq2Seq with attention model [Bahdanau et al., 2015]

• MMI-based models: MMI [Li et al., 2016b], back-MMI [Li et al., 2016b]

• GAN-based model: Adver-REGS [Li et al., 2017]

• Evaluation Metrics

• Automatic evaluation:PPL,BLEU,distinct-1,distinct-2

• Human evaluation

Page 28: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments

Our coherence models produce more fluent and specific results, as compared with baseline methods.

Metric-based Evaluation

Page 29: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments

The end-to-end dual learning approach improve 2.8% distinct-2.

Metric-based Evaluation

Page 30: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments

Human Evaluation

The percentage of strongly related sentences of Seq2SeqCo-dual is significantly better.

The Seq2SeqCo-dual improve 14.5% human evaluation score.

Page 31: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments

Case Study

Page 32: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Summary

• Neural generation model with the MLE objective is likely to generate common meaningless responses, due to not penalizing the responses whose generation probability is high while the true probability is low.

• By reinforcing the coherence between posts and responses, we are able to remedy the blandness problem, leading to informative responses.

• The dual learning architecture, which uses the mutual reciprocityof conversation, can significantly improve the informativeness of responses.

Page 33: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Tailored Sequence to Sequence Models to Different Conversation Scenarios(ACL 2018)

Diversity

Page 34: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Motivation

Teacher : Please describe the winter(大家描述一下冬天)

Student1: Winter is so beautiful. (冬天真美丽)

Student2:Winter is as white as snow. (冬天洁白如雪)

Student3:Suddenly as the night spring breeze, thousandsof trees thousands of pear blossom! (忽如一夜春风来,千树万树梨花开)

Student4: In winter, a thin layer of snow, like a huge softwool blanket, covered the vast wilderness, shining withcold silver.(冬天,一层薄薄的白雪,像巨大的轻软的羊毛毯子,覆盖摘在这广漠的荒原上,闪着寒冷的银光)

People are able to generate diverse responses given a same input utterance

People One-to-many characteristics of Conversation

Page 35: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Motivation

Teacher : Describe the winter?

Bot1:So beautiful 。Bot2:Nice view 。Bot3:Nice view!Bot4:So beautiful !

Machine

Top-k results with beam search are highly similar to each other, leading to boring responses (the collapse problem)

Page 36: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

The Course of the Collapse Problem

MLE learning under the Independent assumption

Teacher : Please describe the winter(大家描述一下冬天)

Student1: Winter is so beautiful. (冬天真美丽)

Student2:Winter is as white as snow. (冬天洁白如雪)

Student3:Suddenly as the night spring breeze, thousands oftrees thousands of pear blossom! (忽如一夜春风来,千树万树梨花开)

Student4: In winter, a thin layer of snow, like a hugesoft wool blanket, covered the vast wilderness, shiningwith cold silver.(冬天,一层薄薄的白雪,像巨大的轻软的羊毛毯子,覆盖摘在这广漠的荒原上,闪着寒冷的银光)

① Ignore the underlying one-to-many structure

② Learn what is easy to learn, not care about the worst cases

Our idea: recover the Structure and consider the Cost in the worst cases

Page 37: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Cost Sensitive Loss

• Value at Risk (VaR)

• VaR is the maximum cost that might be incurred with probability at least α

• VaR is the α-quantile of the distribution of X

• VaR is the smallest cost in the (1- α)*100% worst cases

• VaR is the highest cost in the α *100% best cases

A prominent risk measure used extensively in finance

𝑉𝑎𝑅𝛼(𝑋) ≔ min{𝑐: 𝑃(𝑋 ≤ 𝑐} ≥ 𝛼}

Page 38: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Cost Sensitive Loss

• Conditional Value at Risk (CVaR)

• Averaged VaR

• Expected Shortfall

• Tail Conditional Expectation

• Entropy Value at Risk (EVaR)

• Where 𝑀𝑋(𝑧) is the moment-generating function of X at z

• We have that

Coherent risk measures

𝐸𝑉𝑎𝑅𝛼(𝑋) ≔ inf𝑧>0

{𝑧−1ln(𝑀𝑋(𝑧)/(1 − 𝛼))}

𝐶𝑉𝑎𝑅𝛼(𝑋) ≔ 𝐸[𝑋|𝑋 ≥ 𝑉𝑎𝑅𝛼(𝑋)]

𝑉𝑎𝑅𝛼(𝑋) ≤ 𝐶𝑉𝑎𝑅𝛼(𝑋) ≤ 𝐸𝑉𝑎𝑅𝛼(𝑋)

Page 39: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Solution: Optimize the conditional value-at-risk (CVaR) instead of the traditional likelihood.

Given the post X and its ground-truth responses {𝑌𝑋(1), 𝑌𝑋

(2), … , 𝑌𝑋

(𝑚𝑋)},

where 𝑚𝑋 is the number of ground-truth responses for post X. We use

− log 𝑃(𝑌|𝑋) to denote the cost.

The objective function of CVaR is to minimize:

where 𝑦1−𝛼 is a collection of ground-truth responses such that:

CVaR tries to pay more attention to the post-responses pairsthat have not been generated well so far.

ℒ = −

𝑋

1

1 − 𝛼

𝑌𝑋(𝑘)

𝜖𝑦1−𝛼

log 𝑃(𝑌𝑋(𝑘)|𝑋)

Robust Model I

Sup 𝑃 𝑌𝑋𝑖𝑋 : 𝑌𝑋

𝑖𝜖𝑦1−𝛼 ≤ 1 − 𝛼

Page 40: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Illustration: suppose 1-𝛼=1/3

Post:总决赛继续等待韦德(Waiting for Wade in the final games.)

Responses1:每个人都有每个人的喜爱(Everyone has his favorite stars.)

Responses2:比新浪分析的好多了(Analysis is much better than Sina)

Responses3:等待闪电侠彻底爆发(Waiting for the explosion of Mr.Flash)

P(Resonses1|X)=0.5

P(Resonses2|X)=0.2

P(Resonses2|X)=0.3

Model Illustration

Page 41: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Post:总决赛继续等待韦德(Waiting for Wade in the final games.)

Responses1:每个人都有每个人的喜爱(Everyone has his favorite stars.)

Responses2:比新浪分析的好多了(Analysis is much better than Sina)

Responses3:等待闪电侠彻底爆发(Waiting for the explosion of Mr.Flash)

P(Resonses1|X)=0.3

P(Resonses2|X)=0.3

P(Resonses2|X)=0.4

Model Illustration

Illustration: suppose 1-𝛼=1/3

Page 42: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Post:总决赛继续等待韦德(Waiting for Wade in the final games.)

Responses1:每个人都有每个人的喜爱(Everyone has his favorite stars.)

Responses2:比新浪分析的好多了(Analysis is much better than Sina)

Responses3:等待闪电侠彻底爆发(Waiting for the explosion of Mr.Flash)

P(Resonses1|X)=0.33

P(Resonses2|X)=0.33

P(Resonses2|X)=0.34

Model Illustration

improve the ability of generating diverse responses for beam-search.

Illustration: suppose 1-𝛼=1/3

Page 43: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Solution: Optimize the expected value-at-risk (EVaR) instead of the traditional likelihood.

Given the post X and its ground-truth responses {𝑌𝑋(1), 𝑌𝑋

(2), … , 𝑌𝑋

(𝑚𝑋)},

where 𝑚𝑋 is the number of ground-truth responses for post X.

The objective function of EVaR is to minimize:

EVaR tries to adjust the loss of each training sample to make each sample achieve balanced generation probability

ℒ =

𝑋

1

𝑧𝑙𝑜𝑔

1

𝑚𝑋

𝑖=1

𝑚𝑋

𝑒𝑍×(− log 𝑃(𝑌𝑖|𝑋)) − log(1 − 𝛼)

Robust Model II

➢ If 𝑃(𝑌𝑖|𝑋) is large, the change of loss will be small

➢ If 𝑃(𝑌𝑖|𝑋) is small, the change of loss will be large

➢ The larger the 𝛼 is,the more emphasis on the change of data

Page 44: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiment Dataset• Chinese Weibo Dataset• Extract post-review pairs from the Chinese Weibo website• STC:

• 3 million training pairs

• 0.38 million developing pairs

• 0.4 million testing pairs

Baseline methods:• Seq2Seq model [Sutskever et al.,2014]• RNN-encdec model [Cho et al., 2014]• Seq2Seq with attention model [Bahdanau et al., 2015]• GAN-based method: Adver-REGS [Li et al., 2017]• Style-based method: Mechanism [Zhou et al., 2017]

Experimental Settings

Page 45: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Both risk-aware models obtain lower overlap and divrs than baselinesCVaR obtains larger improvement on diversity than EVaR, as it can change the response distribution more efficiently

Experiments

Metric-based Evaluation

Diversity measure:

Page 46: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Both EVaR and CVaR obtain higher score in terms of manual evaluation as compared with baselines.EVaR achieves higher score than CVaR. EVaR is the up-bound ofCVaR, which learns over all the data while CVaR may ignore some data it considers well trained.

Experiments

Manual Evaluation

1. the response is nonfluent or has wrong logic; or the response is fluent but not related with the post;

2. the response is fluent and weak related, but it’s common which can reply many other posts;

3. the response is fluent and strong related with its post, which is like following a real person’s tone.

Page 47: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments

Case Study

Our model produces both fluent and diverse results

Page 48: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

• Neural generation models learned with MLE objectives under the independent sample assumption are likely to face the model collapse problem.

• We are the first to use the robust distribution model (CVaR andEVaR Model ) to enhance the diversity of dialogue.

• Both CVaR and EVaR Model can be used in many other tasks to improve the diversity or robustness.

Summary

Page 49: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Learning to Control the Specificity in Neural Response Generation(ACL 2018)

Controllability

Page 50: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Conversation Systems

Neural generation models

input x

output y

Fully data-driven,modelling correlationLack the intervene mechanism

Page 51: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Human Conversation Process

Do you know a good eating place for Australianspecial food?

current mood

knowledge state dialogue partner

I’m familiar with the topic

I don’t know

general response

Good Australian eating places include steak, seafood, cake, etc. What do you want to choose?

specific response

Haha, I know several wonderful places in the downtown.

positive response

negative response

No, to be frank, I do not like Australian food.

response purpose

Page 52: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Key Idea

• Introduce a causal generation model with an explicit control variable to represent the response purpose

- Summarizes many latent factors

- Has explicit meaning, e.g., specificity or emotion

- Actively controls the generation, could handle the blandness/collapse problem in another way

current mood

knowledge state dialogue partnerExplicit control variable

Neural generation models

input x

output y

response purpose

Page 53: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model Architecture

• The control variable 𝒔 is introduced into the Seq2Seq model • One-fit-all model -> multiple causal model

• different <utterance, response>, different 𝑠, different models

• Dual representations• Semantic Representations: relate to the semantic meaning

• Usage Representations: relate to the usage preference

Page 54: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model - Encoder

• Bi-RNN: modeling the utterance from both forwardand backward directions ➢ {𝒉1

→, … , 𝒉𝑇→} 𝒉𝑻

←, … , 𝒉1←

➢ 𝐡𝑡 = [𝒉𝑡→, 𝒉𝑇−𝑡+1

← ]

Page 55: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model - Decoder

• Predict target word based on a mixture of two probabilities: the semantic-based and usage-based generation probability

𝑝 𝑦𝑡 = 𝛽𝑝𝑀 𝑦𝑡 + 𝛾𝑝𝑆(𝑦𝑡)

➢ Semantic-based probability✓ Decides what to say next given the input

𝑝𝑀 𝑦𝑡 = 𝑤 = 𝒘𝑇 𝑾𝑀ℎ ∙ 𝒉𝑦𝑡 +𝑾𝑴

𝒆 ∙ 𝒆𝑡−1 + 𝒃𝑀

hidden state semantic

representation

Page 56: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model – Decoder (Continuous control variable)

➢ Specificity-based probability✓ decides how specific we should reply

• Specificity control variable 𝒔 ∈ [0,1]

✓ 0 denotes the most general response

✓ 1 denotes the most specific response

• Gaussian Kernel layer

✓ the specificity control variable interactswith the usage representation of words

✓ let the word usage representation regressto the variable 𝑠

𝑝𝑆 𝑦𝑡 = 𝑤 =1

2𝜋𝜎exp(−

(Ψ𝑆 𝑼,𝒘 − 𝑠)2

2𝜎2)

Ψ𝑆 𝑼,𝒘 = 𝜎(𝒘𝑇(𝑼 ∙ 𝑾𝑈 + 𝒃𝑈))

usage representationvariance

Page 57: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model – Decoder (Discrete control variable)

➢ Emotion-based probability✓ decides which emotion we should express

• Emotion control variable 𝒔 ∈ {1,…,N}

✓ N denotes the number of emotion categories

✓ The emotion categories may include angry, disgust, happy, like, sad, and others.

• Softmax layer

✓ the emotion control variable interacts with the usage representation of words

✓ let the word usage representation beclassified to the variable 𝑠

𝑝𝑆 𝑦𝑡 = 𝑤 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(Φ𝐸(𝑼,𝒘))𝑠Φ𝐸(𝑼,𝒘) = 𝒘𝑇(𝑼 ∙ 𝑾𝑈 + 𝒃𝑈)

Page 58: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Model Training

• Objective function – log likelihood

ℒ =(𝑿,𝒀)𝜖𝒟

log 𝑃(𝒀|𝑿, 𝑠; 𝜃)

• Training data: triples (𝑿, 𝒀, 𝑠)

• Sometimes, we only observe (𝑿, 𝒀) while the control variable s is not directly available in the raw conversation corpus

How to obtain s to learn our model?

We propose to acquire distant labels for 𝑠.

Page 59: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Distant Supervision (Specificity)

• Normalized Inverse Response Frequency (NIRF) ➢ A response is more general if it corresponds to more input utterances ➢ The Inverse Response Frequency (IRF) in a conversation corpus

• Normalized Inverse Word Frequency (NIWF)➢ The response is more specific if it contains more specific words ➢ The maximum of the Inverse Word Frequency (IWF) of all the words in

a response

Page 60: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Specificity Controlled Response Generation

• Given a new input utterance, we can generate responses at different specificity levels by varying the control variable s

• Different s, different models, different responses➢ 𝑠 = 1: the most specific response

➢ 𝑠 ∈ 0,1 : dynamic style in response

➢ 𝑠 = 0: the most general response

0 1

General response Specific response

s

Page 61: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Emotion Controlled Response Generation

• Given a new input utterance, we can generate responses in different emotional states by selecting the control variable s

• Different s, different emotions, different responses➢ 𝑠 = 1: angry

➢ 𝑠 = 2: disgust

➢ 𝑠 = 3: happy

➢ 𝑠 = 4: like

➢ 𝑠 = 5: sad

➢ 𝑠 = 6: others

1 2 3 4 5 6

Page 62: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments - Dataset

• Short Text Conversation (STC) dataset ➢ released in NTCIR-13

➢ a large repository of post-comment pairs from the Sina Weibo

➢ 3.8 million post-comment pairs

➢ Jieba Chinese word segmenter

• Emotional STC (ESTC) Dataset ➢ released in NLPCC-2017

➢ apply the classifier, Bi-LSTM, to annotate the STC Dataset with six emotion categories

➢ Jieba Chinese word segmenter

STC Statistics

ESTC Statistics

Page 63: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments - Model Analysis

1. We vary the discrete control variable s by setting it to five different values (i.e., 0, 0.2, 0.5, 0.8, 1) 2. Varying the discrete variable s from 0 to 1, the generated responses turn from general to specific 3. NIWF (word-based) is a good distant label for the response specificity

general

specific

Table: Model analysis of our SC-Seq2Seq under the automatic evaluation for the specificity controlled generation

Page 64: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments - Comparisons

1. When 𝑠 = 1, our SC-Seq2SeqNIWF model can achieve the best specificity performance

2. When 𝑠 = 0.5, Our SC-Seq2SeqNIWF model can achieve the best performance as compared with all the baseline methods

Automatic evaluation

Table: Comparisons between our SC-Seq2Seq and the baselines under the automatic evaluation for the specificity controlled generation

Page 65: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments - Comparisons

1. SC-Seq2SeqNIWF,𝑠=1 generates the most informative and interesting responses as compared with all the baseline models2. The biggest kappa value is achieved by SC-Seq2SeqNIWF,𝑠=0

Human evaluation

Table: Results on the human evaluation for the specificity controlled generation

+2: the response is not only semantically relevant and grammatical, but also informative and interesting;

+1: the response is grammatical correct and can be used as a response to the utterance, but is too trivial (e.g., “I don’t know”);

+0: the response is semantically irrelevant or ungrammatical (e.g., grammatical errors or UNK).

Page 66: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments - Analysis

1. Neighbors based on semantic representations are semantically related2. Neighbors based on usage representations are not so related but with similar specificity levels

Table: Target words and their top-5 similar words under usage and semantic representations respectively for the specificity controlled generation.

Figure: t-SNE embeddings of usage and semantic vectors for the specificitycontrolled generation.

Page 67: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments - Case study

With s from 1 to 0, SC-Seq2SeqNIWF

can generate very long and specific responses, to more general and shorter responses.

Table: Examples of response generation from the STC test data. s = 1, 0.8, 0.5, 0.2, 0 are the outputs of our SC-Seq2SeqNIWF with different s values evaluation for the specificity controlled generation.

Page 68: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Experiments - Case study

With s sampled from the set (0,1,2,3,4,5), SC-Seq2Seq generate theresponse corresponding to the given emotion category.

Table: Examples of response generation from the ESTC test data. s = 0, 1, 2, 3, 4, 5 are the outputs of our SC-Seq2Seq with different s values evaluation for the emotion controlled generation.

Page 69: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Conclusion

• Modern neural generation model-based conversation systems lack the intervene mechanism to actively control the style of the output responses.

• We propose to introduce a causal generation model with an explicit control variable to handle different post-response relationships in terms of specificity and emotion.

➢Such control mechanism also solve the blandnessand collapse problem in a different way.

Page 70: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Future Work

73

No grounding Grounded

Short term/immediate reward

Long Term/User feedback

Correlation Causal/Interpretable

neural neural-symbolic

SL RL

data-driven/statistics causal/cognitive theory

Page 71: Towards Intelligent Conversational Systemsgjf/talks/2019/SMP-Talk0813.pdf · • Dialogue Tracker • Policy Learning • Natural Language Generation [Goddeau et al.,1996; Lemon and

Jiafeng Guo(郭嘉丰)[email protected]

Thank you