deep reinforcement learning-based image captioning with...
TRANSCRIPT
![Page 1: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/1.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Zhou Ren1 Xiaoyu Wang1 Ning Zhang1 Xutao Lv1 Li-Jia Li2
1 Snap Research 2Google
![Page 2: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/2.jpg)
Image captioning
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
2
[Farhadi et al. ECCV 2010][Kulkarni et al. CVPR 2011][Yang et al. EMNLP 2011][Fang et al. CVPR 2015][Lebret et al. ICLR 2015][Mao et al. ICLR 2015][Vinyals, et al. CVPR 2015][Karpathy et al. CVPR 2015][Chen et al. CVPR 2015][Xu et al. ICML 2015][Johnson et al. CVPR 2016][You et al. CVPR 2016]….
![Page 3: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/3.jpg)
indoordarkfurrysiteat
indoordarkfurrysiteat
Previous work
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
3
catbat chair
[Farhadi et al. 2010; Kulkarni et al. 2011; Yang et al. 2011]
example:
![Page 4: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/4.jpg)
Previous work
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
4
semantic attention [You et al. CVPR 2016]
word detection [Fang et al. CVPR 2015]
spatial attention [Xu et al. ICML 2015]
[Lebret et al. 2015; Mao et al. 2015; Vinyals, et al. 2015; Karpathy et al. 2015]
![Page 5: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/5.jpg)
○ prone to accumulate generation errors during inference
○ sensitive to beam sizes during beam searchlocal
● Our target
○ better at utilizing the global information
○ be able to compensate errors
○ less sensitive to beam sizes
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
5
Motivation
Decision-Making framework
with Reinforcement Learning
● Limitations of current mainstream framework (encoder-decoder)
○ only local information is utilized
![Page 6: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/6.jpg)
Why using decision-making?
Human-level gaming control [Mnih et al. Nature 2015]
Visual navigation [Zhu et al. ICRA 2017]
AlphaGo [Silver et al. Nature 2016]
Agent Goal
EnvironmentDeep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
6
state
actions
reward
![Page 7: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/7.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
7
Image captioning reformulation in decision-making
Agent Goal
Environmentstate
actions
reward
![Page 8: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/8.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
8
Image captioning reformulation in decision-making
Agent Goal
Environmentstate
actions
reward
- Goal: to generate a visual description given an image
![Page 9: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/9.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
9
Image captioning reformulation in decision-making
Agent Goal
Environmentstate
actions
reward
- Agent: the image captioning model to learn
- Goal: to generate a visual description given an image
![Page 10: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/10.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
10
Image captioning reformulation in decision-making
Agent Goal
Environmentstate
actions
reward
- Environment: the given image I + the words predicted so far
- Agent: the image captioning model to learn
- Goal: to generate a visual description given an image
![Page 11: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/11.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
11
Image captioning reformulation in decision-making
Agent Goal
Environmentstate
actions
reward
- State: representation of the environment at t ,
- Environment: the given image I + the words predicted so far
- Agent: the image captioning model to learn
- Goal: to generate a visual description given an image
![Page 12: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/12.jpg)
- State: representation of the environment at t ,
- Action: the word to generate at t + 1,
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
12
Image captioning reformulation in decision-making
Agent Goal
Environmentstate
actions
reward
- Environment: the given image I + the words predicted so far
- Agent: the image captioning model to learn
- Goal: to generate a visual description given an image
![Page 13: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/13.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
13
- State: representation of the environment at t ,
- Environment: the given image I + the words predicted so far
- Agent: the image captioning model to learn
- Action: the word to generate at t + 1,
Image captioning reformulation in decision-making
Agent Goal
Environmentstate
actions
reward
- Goal: to generate a visual description given an image
- Reward: the feedback for reinforcement learning
![Page 14: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/14.jpg)
Overview of our approach
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
14
❏ Training using reinforcement learning with embedding reward
❏ Testing using lookahead inference
● We propose a decision-making framework for image captioning
❏ An agent model contains a policy network, to capture the local information
a value network, to capture the global information
![Page 15: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/15.jpg)
Our approach - agent architecture
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
15
Policy network
Value network
![Page 16: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/16.jpg)
Our approach - agent architecture
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
16
Policy network (local guidance)
Value network
0.82
0.56
0.03
example:
![Page 17: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/17.jpg)
Our approach - agent architecture
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
17
Policy network
Value network (global guidance)
0.89
0.03
vθ(st) is trying to regress the reward in the end
example:
![Page 18: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/18.jpg)
Our approach - train our agent
● Pretrain policy network p with cross entropy loss
● Pretrain value network vᶿ with the mean squared loss
● Train p and vᶿ jointly using deep Reinforcement Learning
○ an Actor-Critic RL model
○ MIXER [Ranzato et al. ICLR 2016]
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
18
![Page 19: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/19.jpg)
Reinforcement learning - reward definition
● Literature: metric-driven [Ranzato et al. ICLR 2016]
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
19
● Limitations:
○ metrics in image captioning are not perfectly defined.
○ it needs to be retrained for each metric in isolation.
○ it doesn’t have value network (no global guidance).
![Page 20: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/20.jpg)
Embedding space
● Visual-Semantic Embedding
Reinforcement learning - reward definition
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
20
[Frome et al. NIPS 2013; Kiros et al. TACL 2015]
example:
![Page 21: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/21.jpg)
Our approach - inference with our agent
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
21
![Page 22: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/22.jpg)
Our approach - inference with our agent
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
22
![Page 23: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/23.jpg)
Our approach - inference with our agent
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
23
![Page 24: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/24.jpg)
Lookahead inference
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
24
![Page 25: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/25.jpg)
Lookahead inference
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
25
![Page 26: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/26.jpg)
Lookahead inference
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
26
![Page 27: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/27.jpg)
Lookahead inference
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
27
global guidance
local guidance
![Page 28: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/28.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
28
Experimental Results
![Page 29: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/29.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
29
![Page 30: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/30.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
30
![Page 31: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/31.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
31
①
②
③
④
![Page 32: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/32.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
32
①
②
③
④
![Page 33: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/33.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
33
simple policy net
simple net structure①
![Page 34: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/34.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
34
simple policy net
simple net structure①
![Page 35: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/35.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
35
simple policy net
advanced net structure
semantic attention
spatial attention
object detection
②
Results on MS-COCO
![Page 36: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/36.jpg)
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
36
simple policy net
advanced net structure
semantic attention
spatial attention
object detection
②
Results on MS-COCO
![Page 37: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/37.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
37
embedding-driven RL
metric-driven RL③
![Page 38: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/38.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
38
③
![Page 39: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/39.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
39
w/o external training data
with external training data④
![Page 40: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/40.jpg)
Results on MS-COCO
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
40
④
![Page 41: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/41.jpg)
Qualitative results
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
41
![Page 42: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/42.jpg)
Qualitative results
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
42
![Page 43: Deep Reinforcement Learning-based Image Captioning with ...web.cs.ucla.edu/~zhou.ren/Zhou_CVPR17_talk.pdf · [Fang et al. CVPR 2015] [Lebret et al. ICLR 2015] [Mao et al. ICLR 2015]](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a97e312fc2727924c906/html5/thumbnails/43.jpg)
Take Home
Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.
43
● We proposed a novel decision-making framework for image captioning.
○ An agent model → a policy network + a value network
○ A training method → Reinforcement Learning with embedding reward
○ An inference method → lookahead inference
● Utilizing both global and local information is important for sequential
generation tasks.
● Embedding can capture global information and can serve as a very
good global guidance.