interleaving language and rl from language generation to ... · interleaving language and rl...

85
Interleaving Language and RL Language Generation Florian Strub Reinforcement Learning Summer School 7 July 2019

Upload: others

Post on 22-Jun-2020

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RLLanguage Generation

Florian StrubReinforcement Learning Summer School

7 July 2019

Page 2: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Overview

● Kind introduction to NLP

● Policy gradient for Translation

● Goal-oriented dialogue systems○ Dialogue setting, ○ GuessWhat?!○ Self-play for language generation

● Other linguistic grounded tasks:○ Language as goal representation: Instruction Following○ Language as state representation: Text Games○ Language as policy compositionality: Emergence of Language

~15min

~talk to me!

~20min

~15min

Page 3: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind Introduction to NLPWhat is NLP?

Natural Language Processing (NLP) aims to extract representations of textual information to read and make sense of human languages in a valuable manner.

Page 4: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind Introduction to NLPWhat is NLP?

Natural Language Processing (NLP) aims to extract representations of textual information to read and make sense of human languages in a valuable manner.

Page 5: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind Introduction to NLPLanguage is Hard :)

Georgetown–IBM experiment 1954 Translate Sixty Russian sentences into English

ARPA report 1966: “there is no immediate or predictable prospect of useful machine translation”

Page 6: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind Introduction to NLPSmall Historical Note

Rationalism

Language possesses an underlying structure that must be discovered, conjecturing that human language is innate and can be modelled like physic laws.

Empiricism

Language is a cognitive process that can be learned through experimentation, advocating to explore learning mechanisms rather than linguistic models

Chomsky, Noam. Syntactic structures.1957.

Transformational generative grammars

J. R. Firth. A synopsis of linguistic theory 1930-55. Studies in Linguistic Analysis

“You shall know a word by the company it keeps”

“The validity of statistical (information theoretic)

approach to MT has indeed been recognized …

as early as 1949. And was universally

recognized as mistaken [sic] by 1950. … The

crude force of computers is not science.”

Review of Brown et al. (1990)

Page 7: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind Introduction to NLPLanguage Modelling

Goal of statistical approach: How likely is a sentence?

- “The cherry on the cake” - “The cake on the cherry ”

but the cake is a lie...

Page 8: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind Introduction to NLPLanguage Modelling

A sentence is represented as a sequence of words:

http://videolectures.net/deeplearning2016_cho_language_understanding/

We compute the probability of the word sequence:

Page 9: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind Introduction to NLPLanguage Modelling

How to estimate the conditional probabilities?

N-gram Modelling:

We can simply count words!

Do not trust the cake!

Several problems (memory footprint, do not generalize etc.)

Page 10: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPNeural Language Modelling

How to estimate the conditional probabilities?

Learn a representation of word history:

Use neural networks !!!

The cherry on the

cherry on the cake

Page 11: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPNeural Language Modelling

Hash Table:

RNN RNN:

MLP

softmax

Classifier:

Basic RNN, LSTM, GRU etc.

Required indexed vocabulary:V = { “Is”:1, … , “cherry”:23 , ... }

Word embedding!

Greedy, stochastic, beam search etc.

Sampling procedure:

Page 12: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPTraining procedure

Given a corpora:

Goal is to maximize the joint probability:

Cake and grief counseling will be available at the conclusion of the test.

Key NLP equation

Page 13: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPTraining Procedure

The cherry on the

cherry over the table

Training Time Test Time

The cherry over the

cherry over the table

*Ground truth

Predicted

Teacher Forcing

** Ground truth

Predicted

Exposure bias

Page 14: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPIt is only the begining!

We can generate language :)

But wait… it is pretty useless! (so is the cake, what a lame running joke...)

Can we use it as a translation system?

Well...

Page 15: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPIt is only the begining!

We can generate language :)

But wait… it is pretty useless! (so is the cake, what a lame running joke...)

The cherry on the cake

La cerise sur le gateau

Can we use it as a translation system?

Well...

He looked for the cake

Il est allé chercher .. le gateau

Bad word alignment

Page 16: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPSeq2Seq Models

Idea: Decompose encoding and decoding!

DecoderEncoder

input

output

sta

te

Page 17: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPSeq2Seq Models

Idea: Decompose encoding and decoding!

DecoderEncoder

sta

te

Le gateau est un mensonge

The cake is a lie

Page 18: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPSeq2Seq Models

sta

te

Idea: Decompose encoding and decoding!

RNNRNN RNNRNN RNN

Encoder

the cake is a

RNN RNN RNN RNN RNN

Decoder

lie

le gateau est un mensonge

Page 19: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPSeq2Seq Models

sta

te

Idea: Decompose encoding and decoding!

RNN RNNRNN RNNRNN

le gateau est un mensonge

the cake … lie <stop>

RNN RNN RNN RNN RNN

<Ignore output>

<start> the cake … lie

Similar training as language modelling (teacher forcing)

Sequence-to-Sequence

Page 20: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPSeq2Seq Models

sta

te

Idea: Decompose encoding and decoding!

the cake … lie <stop>

RNN RNN RNN RNN RNN

<start> the cake … lieR

esN

et

Image captioning

Page 21: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPSeq2Seq Models

Seq2Seq is an Encoder/Decoder architecture

1) Encode language representation

2) Decode vector representation

Model is trained with Cross-Entropy (Teacher Forcing)

Decoder(RNN)

Encoder

input

output

sta

te

You cannot cram the

meaning of a whole

%&!$# sentence into a

single $&!#* vector.

Generated tokensinput tokens

Page 22: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPTranslation

WMT dataset:

- 12M sentences French/English

- Vocabulary 80K words

- Assessed on BLEU score

BLEU is the geometric average of overlapping n-grams in a

set of targets sentences from n=1 to 4 .

Page 23: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPTranslation

WMT dataset:

- 12M sentences French/English

- Vocabulary 80K words

- Assessed on BLEU score

Input le gateau est un mensonge

Predicted the lie is the cake

Target the cake is a lie

When several targets exists, n-gram can be count as many time as they exist in any of the targets

BLEU is the geometric average of overlapping n-grams in a

set of targets sentences from n=1 to 4 .

N=1

4

--

5

Page 24: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPTranslation

WMT dataset:

- 12M sentences French/English

- Vocabulary 80K words

- Assessed on BLEU score

Input le gateau est un mensonge

Predicted the lie is the cake

Target the cake is a lie

When several targets exists, n-gram are count as many time as they may exist in one of the targets

N=2

1

--

4

N=3

0

--

3

N=1 N=4

0

--

2

4

--

5

BLEU is the geometric average of overlapping n-grams in a

set of targets sentences from n=1 to 4 .

Page 25: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Kind introduction to MLPBLEU

BLEU: Order of magnitude

BLEU

Sota 2014 (Durrani 2014) 37.0

Sequence-to-sequence (K. Cho 2014) 34.5

Sequence-to-Sequence (Wu 2016) 38.95

Transformer (Vaswani 2017) 41.8

BLEU is a lie!?

Estimated Human BLEU (Papineni 2002) 34.7

Page 26: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

RL is the carrot on the cake

Page 27: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationSupervised limitation

Supervised in good… but:

● We enforce a mismatch between training and testing: Teacher forcing vs Exposure Bias

● We optimize cross-entropy… but we care about BLEU !

● We optimize for sentence generation… but CE is at the word level (can be criticized)

sta

tethe cake … lie <stop>

RNN RNN RNN RNN RNN

<start> the cake … lie

En

co

de

r

Le

ga

tea

u e

st

un

m

en

so

ng

e

Page 28: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationLanguage as a MDP

Idea: Turn language translation into a MDP and BLEU as a reward

• 𝑠𝑡 = 𝒙, 𝑦1, ⋯ , 𝑦𝑡

• 𝒙 = 𝑥1,⋯ , 𝑥𝑇 where 𝑥𝑡 ∈ 𝑉𝑖𝑛 and 𝒙 is the input sentence

• 𝑦1, ⋯ , 𝑦𝑡′ where 𝑦𝑡 ∈ 𝑉𝑜𝑢𝑡 are the generated token

• 𝑎𝑡+1 ∼ 𝑉𝑜𝑢𝑡𝑝𝑢𝑡

• 𝑠𝑡+1 = 𝑠𝑡 ∪ 𝑎𝑡+1

• 𝑟𝑡+1 = 𝐵𝐿𝐸𝑈 if 𝑎𝑡+1 = < 𝑠𝑡𝑜𝑝 >

0 Otherwise

sta

te𝑦1 𝑦2 … 𝑦𝑇′−1 <stop>

RNN RNN RNN RNN RNN

<start> 𝑦1 𝑦2 𝑦𝑇′−2 𝑦𝑇′−1

En

co

de

r

𝒙

Reward: BLEU

Page 29: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationPolicy Gradient

The goal is to optimize the score function:

𝐽𝜽 = න𝑑𝜋𝜽𝑉𝜋𝜽𝑑𝑥

Expected BLEU according the translation policy over all the potential language pair

The score gradient is estimated by:

∇J𝜽=𝜽ℎ =

𝑡′=1

𝑇′

𝑦=1

𝑉𝑜𝑢𝑡

∇𝜽ℎlog(𝜋𝜽ℎ(𝑦𝑡|𝒙, 𝑦1, ⋯ , 𝑦𝑡′−1))(𝑄𝜋 𝜽ℎ − 𝑏)

Policy Gradient (Sutton 1999) improves the policy by following the score gradient:

𝜽ℎ+1 = 𝜽ℎ + 𝛼ℎ𝛁𝐉𝜽=𝜽ℎ𝛼 learning rateℎ training step

𝑏 baselineQ state-action function

𝑑 probability state distribution𝑉 Value function

Page 30: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationPolicy Gradient

As 𝑉𝑜𝑢𝑡 may be very big, RL is not straightforward!

For example, WMT has 80k words ; Atari has 18 actions!

∇J𝜽=𝜽ℎ =

𝑡′=1

𝑇′

𝑦=1

𝑉𝑜𝑢𝑡

∇𝜽ℎlog(𝜋𝜽ℎ(𝑦𝑡|𝒙, 𝑦1, ⋯ , 𝑦𝑡′−1))(𝑄𝜋 𝜽ℎ − 𝑏)

𝑄𝜋 𝜽ℎ is hard to parametrize: Overestimation, memory footprint (Bahdanau 2016)

σ𝑦=1𝑉𝑜𝑢𝑡 can be intractable. Potential

subsampling etc. (Liu 2018)

Impossible to start from random policy. Required warm start (Ranzato 2015)

Should be parametrized

Page 31: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationPolicy Gradient

Monte-Carlo Variant (REINFORCE-like)

∇J𝜽=𝜽ℎ =

𝑡′=1

𝑇′

∇𝜽ℎlog(𝜋𝜽ℎ(𝑦𝑡|𝒙, 𝑦1, ⋯ , 𝑦𝑡′−1))(𝑄𝜋 − 𝑏)

Where

𝑄𝜋 =

𝜏

𝑡′

𝛾𝜏𝑟𝜏

Intuitively, the full trajectory is equally rewarded. It is either all good or all bad!

𝛾 is often set to 1: No need to search for shortest word trajectory

Page 32: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationSL vs RL

sta

te

𝑦1 𝑦2 … <stop>

RNN RNN RNN RNN

<start> 𝑦1 𝑦2 𝑦𝑇′−1

En

co

de

r

𝒙

BLEU

sta

te

𝑦1 𝑦2 … <stop>

RNN RNN RNN RNN

<start> 𝑦1∗ 𝑦2

∗ 𝑦𝑇′−1∗

En

co

de

r

𝒙

Reinforcement Learning:

𝐽𝜽 = න𝑑𝜋𝜽𝑉𝜋𝜽𝑑𝑥

High-variance, require warm-start

Optimize true score

Signal over trajectories

Supervised learning:

𝑡′=1

𝑇′

log 𝑝𝜃(𝑦𝑡′|𝑥, 𝑦1, ⋯ , 𝑦𝑡′)

Low-variance, sample efficient

Optimize surrogate

Signal word after words

Page 33: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationResults, finally!

Policy Gradient

BLEU score for summarization / image captioning ROUGE score for image captioning (Ranzato 2015)

Does it work ?

Page 34: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationDamn!

Does it really work ? Well…

RED is a lie!

Button was denied his 100th race for McLaren after an ERS prevented him from making it to the start-line. It capped a miserable weekend for the Briton. Button has out-qualified. Finished ahead of Nico Rosberg at Bahrain. Lewis Hamilton has. In 11 races. . The race. To lead 2,000 laps. . In. . . And.

(Paulus 2017)

Reward hacking….

Page 35: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Policy Gradient for Language GenerationTricks, heart of Deep RL ;)

A few trick to alleviate RL issues :

• Parametrize and train your baseline correctly

• Increase batch size!

• Perform an extensive parameter RL sweep parameters

• Adding softmax temperature + check your SL baseline (overconfident)

• Slowly transition from SL to RL

• Check qualitative results!

Recommended slides: http://www.phontron.com/slides/neubig19structured.pdf

Page 36: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Dialogue System with RL

Page 37: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Dialogue SystemsClassic pipeline

Open the pad day door, Hal

I'm sorry, Dave. I'm afraid I can't do that

Natural Language Generator

Natural Language

Understanding

Dialogue utterance 𝑢𝑡:{request “from open door”, From: “Dave”}

Dialogue State Tracker N

ew

sta

te 𝑠𝑡 :

{“Da

ve s

tatu

s”: th

rea

t}

Policy Learning

Action 𝑎𝑡{Inform “Request Denied”, action: “None”}

POMDP setting

Later on… Dave disconnect HAL…

Page 38: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Dialogue Systems

POMDP setting:

• Observation: {request “from open door”, From: “Dave”}

• State: {“Dave status”: threat}

• Policy: Dialogue Manager

• Action: {Inform “Request Denied”, action: “None”}

• Reward: Later on… Dave disconnect HAL…

Given a good NLU / NLG… Good new… it works!

(Young 2013) (DSTC)

Hand-Crafted

Page 39: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Dialogue SystemsClassic pipeline

Open the pad day door, Hal

I'm sorry, Dave. I'm afraid I can't do that

Natural Language Generator

Natural Language

Understanding

Dialogue utterance 𝑢𝑡:{request “from open door”, From: “Dave”}

Dialogue State Tracker N

ew

sta

te 𝑠𝑡 :

{“Da

ve s

tatu

s”: th

rea

t}

Policy Learning

Action 𝑎𝑡{Inform “Request Denied”, action: “None”}

POMDP setting

Later on… Dave disconnect HAL…

Page 40: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Dialogue SystemsClassic pipeline

Open the pad day door, Hal

I'm sorry, Dave. I'm afraid I can't do that

End-2-End Model

Page 41: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Dialogue SystemsClassic pipeline

Open the pad day door, Hal

I'm sorry, Dave. I'm afraid I can't do that

Decoder(RNN)

Encoder(RNN)

sta

te

Idea : Turn into (natural) translation systems! (Vinyals et V. Le 2015) (When et l, 2015)

Page 42: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Dialogue SystemsTaxonomy

Chatbot

Open discussion!

Numerous dataset (Lowe 2015)

Numerous models (Gao 2019)

No reward signal…

Goal-oriented dialogue

Dialogue to solve a task: book plane ticket, find restaurant etc.

No large-scale goal oriented dataset with natural language (10k dialogue)

Clear reward signal!

Page 43: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Wait !! What is that!

Visually grounded goal-oriented natural dialogues

Page 44: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Symbol grounding problem

Symbol Grounding Problem!

(Harnad 1990)

Page 45: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Symbol grounding problem

How to ground symbol?

ケーキ

小麦粉、脂肪、卵、砂糖、およびその他の成分の混合物から作られた、焼きたての、時にはアイスまたは装飾された柔らかい甘い食べ物。

Page 46: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!

Visually grounded dialogues with self-play

Game features:

● Dialogue

● Visually grounded

● Collaborative

● Goal-oriented with a clear reward

Come play… it will be fun.

Page 47: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

GuessWhat?! Game

The game consists in locating a hidden object into a natural scene representation by asking a sequence of questions.

Page 48: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Questionner

Oracle

GuessWhat?!Let’s play

Page 49: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Questionner

GuessWhat?!Let’s play

Questionner

Oracle

Page 50: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ?

GuessWhat?!Let’s play

Questionner

Oracle

Page 51: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

GuessWhat?!Let’s play

Questionner

Oracle

Page 52: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

Is it in the front row?

GuessWhat?!Let’s play

Questionner

Oracle

Page 53: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

Is it in the front row? No

GuessWhat?!Let’s play

Questionner

Oracle

Page 54: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

Is it in the front row? No

Does it have some red on it?

GuessWhat?!Let’s play

Questionner

Oracle

Page 55: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

Is it in the front row? No

Does it have some red on it? No

GuessWhat?!Let’s play

Questionner

Oracle

Page 56: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

Is it in the front row? No

Does it have some red on it? No

Is it the second vase from the right?

GuessWhat?!Let’s play

Questionner

Oracle

Page 57: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

Is it in the front row? No

Does it have some red on it? No

Is it the second vase from the right? Yes

GuessWhat?!Let’s play

Questionner

Oracle

Page 58: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

Is it in the front row? No

Does it have some red on it? No

Is it the second vase from the right? Yes

I found it!

GuessWhat?!Let’s play

Questionner

Oracle

Page 59: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Is it a vase ? Yes

Is it in the front row? No

Does it have some red on it? No

Is it the second vase from the right? Yes

Correct!

GuessWhat?!Let’s play

Questionner

Oracle

Page 60: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

https://guesswhat.ai/explore

GuessWhat?!

Page 61: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Dataset Statistics – Language metrics

Average: 5+ questions Average: ~5 words

Page 62: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Dataset Statistics – Success ratio

The more object there are, the lower is the success ratio The bigger the object is, the higher is the success ratio

Page 63: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Optimal Policy

GuessWhat?!

Page 64: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?! DatasetPotential Language Policy

- Word Taxonomy:

◦ Is it a vehicle? A car? A motorbike?

- Spatial reasoning

◦ Is it on the left? In the background?

◦ Is it on the right of the blue car? Is it between the two zebra?

◦ Is it on the table?

- Counting

◦ Is it the second man of the left?

- Others:

◦ Can it fly? Do you eat with it?

◦ Is it big? Is it square?

Page 65: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Cloud of Words

Page 66: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Questioner Oracle

Repeat until <stop_dialogue>

question

yes/no answer

Find object?

Questioner evaluation procedureGuesseur

dialogue

GuessWhat?!Game Loop

Page 67: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Game Notation

Page 68: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Oracle

79.5%

accuracy

GuessWhat?!Models

Page 69: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Gu

esse

r

63.8%

accuracy

GuessWhat?!Models

Page 70: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Training (Minimize cross-entropy):

GuessWhat?!Models

QG

en

Policy warm-up

Page 71: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Accuracy: The higher, the better!

- New Objects : Image from training set + pick random obje

- New images : Images from the testing set (never seen at training time)

GuessWhat?!Quantitative Results

QG

en

Page 72: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Qualitative Results

Lack of generalization

Poor grounding

Language Imitation pitfall

Page 73: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Limitation of supervised learning

Observation:

● Space of action/state dialogue is too large to generalize

● Supervised learning miss planning aspect

● Supervised learning does not care solving the task! Wrong metric

● (Side issue) Grounding seems imperfect…

Page 74: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!What if…

Is it the best question

What happen if the answer is no

Can we phrase it differently?

Page 75: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Questioner Oracle

Repeat until <stop_dialogue>

question

yes/no answer

Find object?Guesseur

dialogue

GuessWhat?!Game Loop

Reward

Update agent

Page 76: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!RL Notation

Page 77: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

For each question For each word

GuessWhat?!

Policy Gradient!

Policy Gradient

Conditioned on the image

Page 78: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Generate dialogue

Generate game

Find object

Update model

Initialization

GuessWhat?!RL Algorithm

Page 79: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

QGen

Accuracy: The higher, the better!

- New Objects : Image from training set + pick random object

- New images : Images from the testing set (never seen at training time)

GuessWhat?!

Page 80: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!

Page 81: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!What happened?

Good

Optimize the metric

Language strategy is consistent

Bad

Optimize the metric

Language strategy is poor limited

Language quality is bad

Page 82: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Vocabulary Evolution

Human(3000k words)

Beam-search~500 words

RL>100 words

Vocabulary drop:

- Model quickly reduce the action space

- Supervised model are overly confident ( p > 50%)

Are current RL algorithms really ok for NLG?

Better warm-up ?

Page 83: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

GuessWhat?!Language Drift

Language drift…

How to enforce language quality while optimizing for the goal?Reward shaping ? HRL?

(Lewis 2017)

Page 84: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Overview

● Kind introduction to NLP

● Policy gradient for Translation

● Goal-oriented dialogue systems○ Dialogue setting○ GuessWhat?!○ Self-play for language generation

● Other linguistic grounded tasks:○ Language as goal representation: Instruction Following○ Language as state representation: Text Games○ Language as policy compositionality: Emergence of Language

~15min

~talk to me!

~20min

~15min

Page 85: Interleaving Language and RL From Language Generation to ... · Interleaving Language and RL —Florian Strub Kind introduction to MLP Translation WMT dataset: - 12M sentences French/English

Interleaving Language and RL — Florian Strub

Question?The cherry on the cake is a lie !