unc, chapel hill haotan, licheng, [email protected] hao tan, licheng yu, mohit...

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

Hao Tan, Licheng Yu, Mohit Bansalhaotan, licheng, [email protected]

UNC, Chapel Hill1

Go to the bedroom, and go through the door, continue forward until you can climb three steps to your right…

Instruction Bird-View

2

Vision-and-Language Navigation Task



3


Agent’s Start Location



4




5




6




7




8


Agent’s Target Location


agent


AgentActions

…

agent

…

9



10

Agent


Back Translation

11

1.Want to learn: En → Fr English French

Improving Neural Machine Translation Models with Monolingual Data, Senrrich et.al., 2015 Iterative Back-Translation for Neural Machine Translation, Hoang et.al., 2018

Style Transfer Through Back-Translation, Prabhumoye et.al., 2018Understanding Back-Translation at Scale, Edunov et.al., 2018

Back Translation

12

1.Want to learn: En → Fr

2.Have unpaired Fr corpus

English French

French



Back Translation

13

1.Want to learn: En → Fr

2.Have unpaired Fr corpus

3.Train Fr → En and use it to translate unpaired Fr corpus.

English French

French

English FrenchUse Fr→ En



Train En→ Fr

Back Translation

14

1.Want to learn: En → Fr.

2.Have unpaired Fr corpus.

3.Train Fr → En and use it to translate unpaired Fr corpus.

4.Use reversed pairs as additional data for En → Fr.

English French

French

English FrenchUse Fr→ En

English French



Back Translation: Preliminary Setup

15

Environment: A set ofroutes. Some routeshave instructions; Some do not.

Speaker-Follower Models for Vision-and-Language Navigation, Fried et.al., 2018

Back Translation: Preliminary Setup

16

Speaker: A pre-trained neural model which generates instructions from routes.

Walk past the hall, turn Left, ...

Speaker

Environment: A set ofroutes. Some routeshave instructions; Some do not.

Speaker-Follower Models for Vision-and-Language Navigation, Fried et.al., 2018

Back Translation: Step 1

1. New routes from existing environments.

17

Existing Environment



Speaker


2. New instructions by pre-trained speaker.

18



Agent


Speaker



3. Train agent on new routes, new instructions,existing environments.

19


Back Translation:

Agent


Speaker



3. Train agent on new routes, new instructions,existing environments.

20


Limited Envs?

New Environment

1. New routes from New environments.


3. Train agent on New routes, New instructions,New environments.

Back Translation:

Agent


Speaker

21

New Envs!!

How to get new environments?

Captured from new houses?

22


Is very expensive…

23

Captured from new houses?


Generate new environments?

24


Generate new environments?

Not so easy…

25


26

Let’s modify the existing environments!!

Vie

wpo

ints

Views

t

t+1

Illustration: Random Removal

27

RGB-imageviews

Vie

wpo

ints

Views

t

t+1

Illustration: Random Removal

28

Remove objects(Marked in blue)

Vie

wpo

ints

Views

t

t+1

Illustration: Random Removal (Two Issues)

29

Incomplete Removal:The chair is still visible from other views.

Vie

wpo

ints

Views

t

t+1

Illustration: Random Removal (Two Issues)

30

Incomplete Removal:The chair is still visible from other views.

Inconsistent Removal:The same chair disappears in the next viewpoint.

Illustration: Environmental Removal / Dropout

Vie

wpo

ints

Views

t

t+1

31

Solution:Remove all thechairs!!

32

Environmental Dropout: Image-Level Implementation

RGB Image Semantic View

Object-level annotation is noisy.

33

Environmental Dropout: Image-Level

Rendering is slow for training agents.

Render

Environmental Dropout: Feature-Level

Views Feat D

ims

Vie

wpo

ints

Views Feat D

ims

Vie

wpo

ints

Random Feature Dropout Environmental Dropout34

Environmental Dropout: Full Pipeline

35Agent


Speaker


Env Drop

New Environment

36

Results Comparison

Metric: Success Rate

Evaluated in Unseen Environments

Training Environments

Testing Environments

Agent

Agent Training

AgentWalk past the hall, turn Left, ...

Results Comparison

46.5%37

Results Comparison

Agent


Speaker

Agent Training Back Translation



46.5% 48.2% (+1.7%)38

Results Comparison

Agent


Speaker


Back Translationw/ Random Dropout

New Environment

Agent


Speaker



Random Drop


46.5% 48.2% (+1.7%) 48.4% (+1.9%)39

Results Comparison

Agent


Speaker


Back Translationw/ Env Dropout

New Environment

Agent


Speaker



Env Drop


46.5% 48.2% (+1.7%) 52.2% (+5.7%)40

Leaderboard Results

41

Beam SearchGreedy Decoding

Previous Best: 48.0%

Ours: 51.5% (+3.5%)

Previous Best: 63.0%

Ours: 68.9% (+6.9%)

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation, Ma et.al., 2019Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation, Wang et.al., 2019

42

Sufficient

If we use “new” environment, the result is better.

43

Sufficient and Necessary

If we use “new” environment, the result is better.

If we do not use “new” environment, the result would not be better.

44

Upper Bound of Back Translation (on Existing Envs)

Labeled Data Unlabeled Data

Existing Environments

45



Labeled Data Pseudo-labeled Data

BackTranslation

46




Labeled Data **Labeled** Data

BackTranslation

“WeakerThan”

Assumption

47





BackTranslation

Upper Bound


48


Labeled Data **Labeled** DataUpper Bound

How to calculate (approximate) this upper bound?


49



How to calculate (approximate) this upper bound?

“Result Extrapolation” Approximation


Upper Bound

50

“Result Extrapolation” Approximation

Labeled Data **Labeled** Data **52%**

26% Training data

73% Training data

100% Training data 46%

45%

42%

Predict

51

Reinforcement Learning + Imitation Learning

Agent Agent Agent

Teacher Actions<BOS>

Agent Agent Agent

<BOS>

Sampling Sampling

RL:

IL:

Walk past the shelves and out of the garage.Stop in ...

Rewards

Code released at:https://github.com/airsplay/R2R-EnvDrop

Hao Tan, Licheng Yu, Mohit Bansal

Thank you!

52

UNC Chapel Hill

Supported by ARO-YIP, ONR, Google, Facebook, Adobe, Baidu, and Salesforce.

https://github.com/airsplay/R2R-EnvDrop

unc, chapel hill haotan, licheng, [email protected] hao tan, licheng yu, mohit...

Documents