learning to interpret natural language navigation instructions from observation

51
1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen Joohyun Kim Lu Guo.........

Upload: gzifa

Post on 23-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Learning to Interpret Natural Language Navigation Instructions from Observation. Ray Mooney Department of Computer Science University of Texas at Austin. Joint work with David Chen Joohyun Kim Lu Guo . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

1

Learning to Interpret Natural Language Navigation Instructions

from ObservationRay Mooney

Department of Computer ScienceUniversity of Texas at Austin

Joint work withDavid Chen Joohyun Kim

Lu Guo.........

Page 2: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

2

Challenge Problem:Learning to Follow Directions in a Virtual World

• Learn to interpret navigation instructions in a virtual environment by simply observing humans giving and following such directions (Chen & Mooney, AAAI-11).

• Eventual goal: Virtual agents in video games and educational software that automatically learn to take and give instructions in natural language.

Page 3: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

H

C

L

S S

B C

H

E

L

E

Sample Environment(MacMahon, et al. AAAI-06)

H – Hat Rack

L – Lamp

E – Easel

S – Sofa

B – Barstool

C - Chair

3

Page 4: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Sample Instructions• Take your first left. Go all the

way down until you hit a dead end.

• Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4.

• Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4.

• Walk forward once. Turn left. Walk forward twice.

Start 3

H 4

4

End

Page 5: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Sample Instructions

3

H 4

• Take your first left. Go all the way down until you hit a dead end.

• Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4.

• Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4.

• Walk forward once. Turn left. Walk forward twice.

Observed primitive actions:Forward, Left, Forward, Forward

5

Start

End

Page 7: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Executing Test Instance in English

Page 8: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Formal Problem Definition

Given:{ (e1, w1 , a1), (e2, w2 , a2), … , (en, wn , an) }ei – A natural language instruction

wi – A world stateai – An observed action sequence

Goal:Build a system that produces the correct aj given a previously unseen (ej, wj).

Page 9: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Observation

Instruction

World State

Training

Action Trace

Page 10: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Training

Action TraceNavigation Plan Constructor

Page 11: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Training

Action TraceNavigation Plan Constructor

Semantic Parser Learner

Page 12: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Instruction

World State

TrainingTesting

Action TraceNavigation Plan Constructor

Semantic Parser Learner

Page 13: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Instruction

World State

TrainingTesting

Action TraceNavigation Plan Constructor

Semantic Parser Learner

Semantic Parser

Page 14: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Execution Module (MARCO)

Instruction

World State

TrainingTesting

Action TraceNavigation Plan Constructor

Semantic Parser Learner

Semantic Parser

Action Trace

Page 15: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Representing Linguistic Context

Context is represented by the sequence of observed actions each followed by verifying all observable aspects of the resulting world state.

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

39

Page 16: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Possible Plans

An instruction can refer to a combinatorial number of possible plans, each composed of some subset of this full contextual description.

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

39

Page 17: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Possible Plan # 1

Turn and walk to the couch

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

40

Page 18: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Possible Plan # 2

Face the blue hall and walk 2 steps

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

41

Page 19: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Possible Plan # 3

Turn left. Walk forward twice.

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

42

Page 20: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Disambiguating Sentence Meaning

• Too many meanings to tractably enumerate them all.

• Therefore, cannot use EM to align sentences with enumerated meanings and thereby disambiguate the training data.

43

Page 21: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Execution Module (MARCO)

Instruction

World State

TrainingTesting

Action TraceNavigation Plan Constructor

Semantic Parser Learner

Semantic Parser

Action Trace

Page 22: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Execution Module (MARCO)

Instruction

World State

TrainingTesting

Action TraceContext Extractor

Semantic Parser Learner

Semantic Parser

Action Trace

Page 23: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Execution Module (MARCO)

Instruction

World State

TrainingTesting

Action TraceContext Extractor

Semantic Parser Learner

Semantic Parser

Action Trace

Lexicon Learner

Page 24: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Learning system for parsing navigation instructions

Observation

Instruction

World State

Execution Module (MARCO)

Instruction

World State

TrainingTesting

Action TraceContext Extractor

Semantic Parser Learner

Semantic Parser

Action Trace

Lexicon Learner

Plan Refinement

Page 25: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Lexicon Learning

• Learn meanings of words and short phrases by finding correlations with meaning fragments.

43

Verify TravelTurn

steps: 2

front: BLUEHALL

face

blue hall 2 steps

walk

Page 26: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Lexicon Learning Algorithm

To learn the meaning of the word/short phrase w:1. Collect all landmark plans that co-occur with w

and add them to the set PosMean(w)2. Repeatedly take intersections of all possible

pairs of members of PosMean(w) and add any new entries, g, to PosMean(w).

3. Rank the entries by the scoring function:

Page 27: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

VerifyTravel Turn Verify

front: BLUEHALL

steps: 1

at: SOFA LEFT

Graph 1: “Turn and walk to the sofa.”

Graph 2: “Walk to the sofa and turn left.”

Graph Intersection

Page 28: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

VerifyTravel Turn Verify

front: BLUEHALL

steps: 1

at: SOFA LEFT

VerifyTurn

LEFTfront: BLUEHALL

Intersections:

Graph IntersectionGraph 1: “Turn and walk to the sofa.”

Graph 2: “Walk to the sofa and turn left.”

Page 29: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

VerifyTravel Turn Verify

front: BLUEHALL

steps: 1

at: SOFA LEFT

VerifyTurn

LEFTfront: BLUEHALL

Travel Verify

at: SOFA

Intersections:

Graph IntersectionGraph 1: “Turn and walk to the sofa.”

Graph 2: “Walk to the sofa and turn left.”

Page 30: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Plan Refinement

• Use learned lexicon to determine subset of context representing sentence meaning.

43

Face the blue hall and walk 2 steps

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

Page 31: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Plan Refinement

• Use learned lexicon to determine subset of context representing sentence meaning.

43

Face the blue hall and walk 2 steps

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

Page 32: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Plan Refinement

• Use learned lexicon to determine subset of context representing sentence meaning.

43

Face the blue hall and walk 2 steps

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

Page 33: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Plan Refinement

• Use learned lexicon to determine subset of context representing sentence meaning.

43

Face the blue hall and walk 2 steps

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

Page 34: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Plan Refinement

• Use learned lexicon to determine subset of context representing sentence meaning.

43

Face the blue hall and walk 2 steps

Verify TravelTurn Verify

LEFT steps: 2

at: SOFA

front: SOFA

front: BLUEHALL

Page 35: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Evaluation Data Statistics

• 3 maps, 6 instructors, 1-15 followers/direction• Hand-segmented into single sentence steps

Paragraph Single-Sentence

# Instructions 706 3,236

Avg. # sentences 5.0 (±2.8) 1.0 (±0)

Avg. # words 37.6 (±21.1) 7.8 (±5.1)

Avg. # actions 10.4 (±5.7) 2.1 (±2.4)

Page 36: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

End-to-End Execution Evaluation

• Test how well the system follows novel directions.• Leave-one-map-out cross-validation.• Strict metric: Only correct if the final position exactly

matches goal location.• Lower baselines:

• Simple probabilistic generative model of executed plans w/o language.

• Semantic parser trained on full context plans• Upper baselines:

• Semantic parser trained on human annotated plans• Human followers

Page 37: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

End-to-End Execution Accuracy

Single-Sentence ParagraphSimple Generative Model 11.08% 2.15%Trained on Full Context 21.95% 2.66%Trained on Refined Plans 57.28% 19.18%Trained onHuman Annotated Plans 62.67% 29.59%

Human Followers N/A 69.64%

Page 38: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Sample Successful ParseInstruction: “Place your back against the wall of the ‘T’ intersection.

Turn left. Go forward along the pink-flowered carpet hall two segments to the intersection with the brick hall. This intersection contains a hatrack. Turn left. Go forward three segments to an intersection with a bare concrete hall, passing a lamp. This is Position 5.”

Parse: Turn ( ), Verify ( back: WALL ),Turn ( LEFT ),Travel ( ),Verify ( side: BRICK HALLWAY ),Turn ( LEFT ),Travel ( steps: 3 ),Verify ( side: CONCRETE HALLWAY )

Page 39: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Mandarin Chinese Experiment

• Translated all the instructions from English to Chinese.

64

Single Sentences ParagraphsTrained on Refined Plans 58.70% 20.13%

Page 40: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Problem with Purely Correlational Lexicon Learning

• The correlation between an n-gram w and graph g can be affected by the context.

• Example:– Bigram: ”the wall”– Sample uses:

• ”turn so the wall is on your right side”• ”with your back to the wall turn left”

– Co-occurring aspects of context• TURN()• VERIFY(direction: WALL)

– But “the wall” is simply an object involving no action

40

Page 41: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Syntactic Bootstrapping

• Children sometimes use syntactic information to guide learning of word meanings (Gleitman, 1990).

• Complement to Pinker’s semantic bootstrapping in which semantics is used to help learn syntax.

41

Page 42: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Using POS to Aid Lexicon Learning

• Annotate each n-gram, w, with POS tags.– dead/JJ end/NN

• Annotate each node in meaning graph, g, with a semantic-category tag.– TURN/Action VERIFY/Action FORWARD/Action

42

Reason: “dead end” is often followed by the action of turning around to face another direction so that there is a way to go forward

Page 43: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Constraints on Lexicon Entry: (w,g)

• The n-gram w should contain a noun if and only if the graph g contains an Object

• The n-gram w should contain a verb if and only if the graph g contains an Action

43

dead/JJ end/NNTURN/Action VERIFT/Action FORWARD/Action

dead/JJ end/NNFront/Relation WALL/Object

Violates the Rules! Remove it. Retain it.

Page 44: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Experimental Results

44

Page 45: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

PCFG Induction Model for Grounded Language Learning (Borschinger et al. 2011)

• PCFG rules to describe generative process from MR components to corresponding NL words

Page 46: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Series of Grounded Language Learning Papers that Build Upon Each Other

• Kate & Mooney, AAAI-07• Chen & Mooney, ICML-08• Liang, Jordan, and Klein, ACL-09• Kim & Mooney, COLING-10

– Also integrates Lu, Ng, Lee, & Zettlemoyer, EMNLP-08

• Borschinger, Jones, & Johnson, EMNLP-11• Kim & Mooney, EMNLP-12

46

Page 47: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

47

PCFG Induction Model for Grounded Language Learning (Borschinger et al. 2011)

• Generative process– Select complete MR to describe– Generate atomic MR constituents in order– Each atomic MR generates NL words by unigram Markov

process• Parameters learned using EM (Inside-Outside)• Parse new NL sentences by reading top MR

nonterminal from most probable parse tree– Output MRs only included in PCFG rule set constructed

from training data

Page 48: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Limitations of Borschinger et al. 2011PCFG Approach

• Only works in low ambiguity settings.– Where each sentence can refer to only a few

possible MRs.• Only output MRs explicitly included in the

PCFG constructed from the training data• Produces intractably large PCFGs for

complex MRs with high ambiguity.– Would require ~1018 productions for our

navigation data.

48

Page 49: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

Our Enhanced PCFG Model(Kim & Mooney, EMNLP-2012)

• Use learned semantic lexicon to constrain the constructed PCFG.

• Limit each MR to generate only words and phrases paired with this MR in the lexicon.– Only ~18,000 productions produced for the

navigation data, compared to ~33,000 produced by Borschinger et al. for far simpler Robocup data.

• Output novel MRs not appearing in the PCFG by composing subgraphs from the overall context.

49

Page 50: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

50

End-to-End Execution Evaluations

Single Sentences Paragraphs Mapping to supervised semantic parsing 57.28% 19.18%

Our PCFG model 57.22% 20.17%

Page 51: Learning to  Interpret  Natural  Language  Navigation Instructions  from Observation

51

Conclusions

• Challenge problem: Learn to follow NL instructions by just watching people follow them.

• Our goal: Learn without assuming any prior linguistic knowledge.– Easily adapt to new languages

• Exploit existing work on learning for semantic parsing in order to produce structured meaning representations that can handle complex instructions.

• Encouraging initial results on learning to navigate in a virtual world, but still far from human-level performance.