unc, chapel hill haotan, licheng, [email protected] hao tan, licheng yu, mohit...
TRANSCRIPT
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Hao Tan, Licheng Yu, Mohit Bansalhaotan, licheng, [email protected]
UNC, Chapel Hill1
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
Instruction Bird-View
2
Vision-and-Language Navigation Task
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
Instruction Bird-View
3
Vision-and-Language Navigation Task
Agent’s Start Location
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
Instruction Bird-View
4
Vision-and-Language Navigation Task
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
Instruction Bird-View
5
Vision-and-Language Navigation Task
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
Instruction Bird-View
6
Vision-and-Language Navigation Task
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
Instruction Bird-View
7
Vision-and-Language Navigation Task
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
Instruction Bird-View
8
Vision-and-Language Navigation Task
Agent’s Target Location
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
agent
Instruction Bird-View
AgentActions
…
agent
…
9
Vision-and-Language Navigation Task
Vision-and-Language Navigation Task
10
Agent
Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…
Back Translation
11
1.Want to learn: En → Fr English French
Improving Neural Machine Translation Models with Monolingual Data, Senrrich et.al., 2015 Iterative Back-Translation for Neural Machine Translation, Hoang et.al., 2018
Style Transfer Through Back-Translation, Prabhumoye et.al., 2018Understanding Back-Translation at Scale, Edunov et.al., 2018
Back Translation
12
1.Want to learn: En → Fr
2.Have unpaired Fr corpus
English French
French
Improving Neural Machine Translation Models with Monolingual Data, Senrrich et.al., 2015 Iterative Back-Translation for Neural Machine Translation, Hoang et.al., 2018
Style Transfer Through Back-Translation, Prabhumoye et.al., 2018Understanding Back-Translation at Scale, Edunov et.al., 2018
Back Translation
13
1.Want to learn: En → Fr
2.Have unpaired Fr corpus
3.Train Fr → En and use it to translate unpaired Fr corpus.
English French
French
English FrenchUse Fr→ En
Improving Neural Machine Translation Models with Monolingual Data, Senrrich et.al., 2015 Iterative Back-Translation for Neural Machine Translation, Hoang et.al., 2018
Style Transfer Through Back-Translation, Prabhumoye et.al., 2018Understanding Back-Translation at Scale, Edunov et.al., 2018
Train En→ Fr
Back Translation
14
1.Want to learn: En → Fr.
2.Have unpaired Fr corpus.
3.Train Fr → En and use it to translate unpaired Fr corpus.
4.Use reversed pairs as additional data for En → Fr.
English French
French
English FrenchUse Fr→ En
English French
Improving Neural Machine Translation Models with Monolingual Data, Senrrich et.al., 2015 Iterative Back-Translation for Neural Machine Translation, Hoang et.al., 2018
Style Transfer Through Back-Translation, Prabhumoye et.al., 2018Understanding Back-Translation at Scale, Edunov et.al., 2018
Back Translation: Preliminary Setup
15
Environment: A set ofroutes. Some routeshave instructions; Some do not.
Speaker-Follower Models for Vision-and-Language Navigation, Fried et.al., 2018
Back Translation: Preliminary Setup
16
Speaker: A pre-trained neural model which generates instructions from routes.
Walk past the hall, turn Left, ...
Speaker
Environment: A set ofroutes. Some routeshave instructions; Some do not.
Speaker-Follower Models for Vision-and-Language Navigation, Fried et.al., 2018
Back Translation: Step 1
1. New routes from existing environments.
17
Existing Environment
Back Translation: Step 2
Walk past the hall, turn Left, ...
Speaker
1. New routes from existing environments.
2. New instructions by pre-trained speaker.
18
Existing Environment
Back Translation: Step 3
Agent
Walk past the hall, turn Left, ...
Speaker
1. New routes from existing environments.
2. New instructions by pre-trained speaker.
3. Train agent on new routes, new instructions,existing environments.
19
Existing Environment
Back Translation:
Agent
Walk past the hall, turn Left, ...
Speaker
1. New routes from existing environments.
2. New instructions by pre-trained speaker.
3. Train agent on new routes, new instructions,existing environments.
20
Existing Environment
Limited Envs?
New Environment
1. New routes from New environments.
2. New instructions by pre-trained speaker.
3. Train agent on New routes, New instructions,New environments.
Back Translation:
Agent
Walk past the hall, turn Left, ...
Speaker
21
New Envs!!
How to get new environments?
Captured from new houses?
22
How to get new environments?
Is very expensive…
23
Captured from new houses?
How to get new environments?
Generate new environments?
24
How to get new environments?
Generate new environments?
Not so easy…
25
How to get new environments?
26
Let’s modify the existing environments!!
Vie
wpo
ints
Views
t
t+1
Illustration: Random Removal
27
RGB-imageviews
Vie
wpo
ints
Views
t
t+1
Illustration: Random Removal
28
Remove objects(Marked in blue)
Vie
wpo
ints
Views
t
t+1
Illustration: Random Removal (Two Issues)
29
Incomplete Removal:The chair is still visible from other views.
Vie
wpo
ints
Views
t
t+1
Illustration: Random Removal (Two Issues)
30
Incomplete Removal:The chair is still visible from other views.
Inconsistent Removal:The same chair disappears in the next viewpoint.
Illustration: Environmental Removal / Dropout
Vie
wpo
ints
Views
t
t+1
31
Solution:Remove all thechairs!!
32
Environmental Dropout: Image-Level Implementation
RGB Image Semantic View
Object-level annotation is noisy.
33
Environmental Dropout: Image-Level
Rendering is slow for training agents.
Render
Environmental Dropout: Feature-Level
Views Feat D
ims
Vie
wpo
ints
Views Feat D
ims
Vie
wpo
ints
Random Feature Dropout Environmental Dropout34
Environmental Dropout: Full Pipeline
35Agent
Walk past the hall, turn Left, ...
Speaker
Existing Environment
Env Drop
New Environment
36
Results Comparison
Metric: Success Rate
Evaluated in Unseen Environments
Training Environments
Testing Environments
Agent
Agent Training
AgentWalk past the hall, turn Left, ...
Results Comparison
46.5%37
Results Comparison
Agent
Walk past the hall, turn Left, ...
Speaker
Agent Training Back Translation
Existing Environment
AgentWalk past the hall, turn Left, ...
46.5% 48.2% (+1.7%)38
Results Comparison
Agent
Walk past the hall, turn Left, ...
Speaker
Agent Training Back Translation
Back Translationw/ Random Dropout
New Environment
Agent
Walk past the hall, turn Left, ...
Speaker
Existing Environment
Existing Environment
Random Drop
AgentWalk past the hall, turn Left, ...
46.5% 48.2% (+1.7%) 48.4% (+1.9%)39
Results Comparison
Agent
Walk past the hall, turn Left, ...
Speaker
Agent Training Back Translation
Back Translationw/ Env Dropout
New Environment
Agent
Walk past the hall, turn Left, ...
Speaker
Existing Environment
Existing Environment
Env Drop
AgentWalk past the hall, turn Left, ...
46.5% 48.2% (+1.7%) 52.2% (+5.7%)40
Leaderboard Results
41
Beam SearchGreedy Decoding
Previous Best: 48.0%
Ours: 51.5% (+3.5%)
Previous Best: 63.0%
Ours: 68.9% (+6.9%)
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation, Ma et.al., 2019Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation, Wang et.al., 2019
42
Sufficient
If we use “new” environment, the result is better.
43
Sufficient and Necessary
If we use “new” environment, the result is better.
If we do not use “new” environment, the result would not be better.
44
Upper Bound of Back Translation (on Existing Envs)
Labeled Data Unlabeled Data
Existing Environments
45
Upper Bound of Back Translation (on Existing Envs)
Labeled Data Unlabeled Data
Labeled Data Pseudo-labeled Data
BackTranslation
46
Upper Bound of Back Translation (on Existing Envs)
Labeled Data Unlabeled Data
Labeled Data Pseudo-labeled Data
Labeled Data **Labeled** Data
BackTranslation
“WeakerThan”
Assumption
47
Upper Bound of Back Translation (on Existing Envs)
Labeled Data Unlabeled Data
Labeled Data Pseudo-labeled Data
Labeled Data **Labeled** Data
BackTranslation
Upper Bound
Existing Environments
48
Upper Bound of Back Translation (on Existing Envs)
Labeled Data **Labeled** DataUpper Bound
How to calculate (approximate) this upper bound?
Existing Environments
49
Upper Bound of Back Translation (on Existing Envs)
Labeled Data **Labeled** Data
How to calculate (approximate) this upper bound?
“Result Extrapolation” Approximation
Existing Environments
Upper Bound
50
“Result Extrapolation” Approximation
Labeled Data **Labeled** Data **52%**
26% Training data
73% Training data
100% Training data 46%
45%
42%
Predict
51
Reinforcement Learning + Imitation Learning
Agent Agent Agent
Teacher Actions<BOS>
Agent Agent Agent
<BOS>
Sampling Sampling
RL:
IL:
Walk past the shelves and out of the garage.Stop in ...
Rewards
Code released at:https://github.com/airsplay/R2R-EnvDrop
Hao Tan, Licheng Yu, Mohit Bansal
Thank you!
52
UNC Chapel Hill
Supported by ARO-YIP, ONR, Google, Facebook, Adobe, Baidu, and Salesforce.