abhishek das, satwik kottur, josé m.f. moura, …learning cooperative visual dialog agents with...
TRANSCRIPT
![Page 1: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/1.jpg)
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra
IEEE ICCV 2017
Presented By:Nalin Chhibber
CS 885: Reinforcement LearningPascal Poupart
![Page 2: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/2.jpg)
Outline● Introduction● Paper overview● Contribution and key takeaways● Critique● Class discussion
![Page 3: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/3.jpg)
Introduction
Problem Space: Intersection of Vision and Language
● Image Captioning○ Predict one sentence description of an image.
● Visual Question Answering○ Predict a natural language answer given an image and a question.
● Visual Dialog○ Predict a free-form NL answer given an image, a dialog history, and
a follow-up question.
![Page 4: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/4.jpg)
Paper Overview
Focused on creating a visually-grounded conversational artificial intelligence (AI)
Develop AI agents that can● See (understand contents of an image)● Communicate (understand and hold a dialog in natural language)
Applications:● Help visually impaired users understand their surroundings● Enable analysts to sift through large quantities of surveillance data
![Page 5: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/5.jpg)
Paper Overview
Most of the previous work treat this as a static supervised learning problem
Problem-1Model cannot steer conversation and doesn’t get to see the future consequences of its utterances during training.
Problem-2Evaluations are infeasible for utterances outside the dataset.
![Page 6: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/6.jpg)
Guess WhichAn image guessing game between Q-Bot and A-Bot
![Page 7: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/7.jpg)
Guess WhichAn image guessing game between Q-Bot and A-Bot
Q-Bot
Questioning AgentBlind-folded
![Page 8: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/8.jpg)
A-Bot
Answering AgentAccess to secret image
Guess WhichAn image guessing game between Q-Bot and A-Bot
![Page 9: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/9.jpg)
Training types
Conducted two types of demonstration with
● Completely ungrounded synthetic world (RL from scratch)○ Agents communicate via symbols with no pre-specified meanings.
● Large-scale experiment on real images using VisDial dataset○ Pretrain on dialog data with SL, followed by fine-tuning with RL.
![Page 10: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/10.jpg)
Reinforcement Learning Framework
Environment
Action
State
Other agent + set of all images Other agent + secret image
Set of all possible questions Set of valid answers
st=[I, c, q1a1, q2a2..qt-1at-1, qt]st=[c, q1a1, q2a2..qt-1at-1]
RewardChange in distance to the true representation before/after a round of dialog
![Page 11: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/11.jpg)
Training Details1. Pretrained with supervised learning on Visual Dialog dataset (VisDial)2. Fine-tuned with REINFORCE
![Page 12: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/12.jpg)
Training Details1. Pretrained with supervised learning on Visual Dialog dataset (VisDial)2. Fine-tuned with REINFORCE
Curriculum LearningProblem: Discrete change in learning landscapeSolution: Gently hand over control to reinforcement learning
![Page 13: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/13.jpg)
Training Details1. Pretrained with supervised learning on Visual Dialog dataset (VisDial)2. Fine-tuned with REINFORCE
Curriculum LearningProblem: Discrete change in learning landscapeSolution: Gently hand over control to reinforcement learning
Reward ShapingProblem: Delayed rewardSolution: Improvement-based intermediate rewards
![Page 14: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/14.jpg)
Model Internals
![Page 15: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/15.jpg)
Model Internals
![Page 16: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/16.jpg)
![Page 17: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/17.jpg)
![Page 18: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/18.jpg)
![Page 19: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/19.jpg)
![Page 20: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/20.jpg)
![Page 21: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/21.jpg)
![Page 22: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/22.jpg)
![Page 23: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/23.jpg)
![Page 24: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/24.jpg)
![Page 25: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/25.jpg)
![Page 26: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/26.jpg)
![Page 27: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/27.jpg)
![Page 28: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/28.jpg)
![Page 29: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/29.jpg)
![Page 30: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/30.jpg)
![Page 31: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/31.jpg)
![Page 32: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/32.jpg)
![Page 33: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/33.jpg)
![Page 34: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/34.jpg)
![Page 35: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/35.jpg)
![Page 36: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/36.jpg)
![Page 37: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/37.jpg)
Model Evaluation
![Page 38: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/38.jpg)
Model Evaluation
1. Comparison with few natural ablations of the full model (RL-full-QAf)○ SL-pretrained○ Frozen-A○ Frozen-Q○ Frozen-F (regression network)
2. How well the agents perform at guessing game3. How closely they emulate human dialogs
![Page 39: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/39.jpg)
Evaluation 1
1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs
![Page 40: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/40.jpg)
Evaluation 2
1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs
![Page 41: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/41.jpg)
Evaluation 2
1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs
![Page 42: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/42.jpg)
Evaluation 2
1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs
![Page 43: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/43.jpg)
Evaluation 2
1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs
![Page 44: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/44.jpg)
Evaluation 2
![Page 45: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/45.jpg)
Evaluation 2
![Page 46: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/46.jpg)
Evaluation 2
![Page 47: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/47.jpg)
Evaluation 2
Sorting based on distance to fc7 vectors
![Page 48: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/48.jpg)
Evaluation 2
Sorting based on distance to fc7 vectors
![Page 49: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/49.jpg)
Evaluation 2
Rank of ground truth image = 2
![Page 50: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/50.jpg)
Evaluation 3
1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs
Human interpretability study to measure:● whether humans can easily understand the Q-BOT-A-BOT dialog.● how image-discriminative the interactions are.
Mean rank for ground-truth image (lower is better)
Mean Reciprocal Rank(higher is better)
3.70 vs 2.73(SL) (RL)
0.518 vs 0.622(SL) (RL)
![Page 51: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/51.jpg)
Results
![Page 52: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/52.jpg)
SL vs SL+RL
Supervised Q-BOT seemed to mimic how humans ask questions.
RL trained Q-BOT seemed to shifts strategies and asks questions that the A-BOT was better at answering.
Results
![Page 53: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/53.jpg)
SL vs SL+RL
Supervised Q-BOT seemed to mimic how humans ask questions.
RL trained Q-BOT seemed to shifts strategies and asks questions that the A-BOT was better at answering.
Dialog between the agents were NOT ‘hand engineered’ to be image discriminative. It emerged as a strategy to succeed at the image-guessing game.
Results
![Page 54: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/54.jpg)
● Emergence of Grounding (RL from scratch)
More details in the follow-up paper:Natural Language Does Not Emerge 'Naturally' in Multi-Agent
DialogKottur et al., EMNLP 2017
Results
The two bots invented their own communication protocol without any human supervision
![Page 55: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/55.jpg)
Contributions● Goal-driven training of visual question answering and dialog agents.
○ Self-talk = infinite data○ Goal-based = evaluation on downstream task○ Agent-driven = agents learn to deal with consequences of their actions.
● End-to-end learning from pixels to multi-agent multi-round dialog to game reward.○ Move from SL on static datasets to RL on actual environment.
![Page 56: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,](https://reader034.vdocument.in/reader034/viewer/2022042404/5f1b0e7498dd4832d956682b/html5/thumbnails/56.jpg)
Class Discussions
● Do you think this approach is limited to goal-driven tasks in dialog systems?○ If not, how can this be extended to open-ended conversations?
● What other reward models can be used to make SL-RL dialog systems more successful?