natural language interaction with robots

31
Natural Language Interaction with Robots Alden Walker May 7, 2007 Abstract Natural language communication with robots has obvious uses in almost all areas of life. Computer-based natural language interaction is an active area of research in Computational Linguistics and AI. While there have been several NL systems built for specific computer applications, NL interaction with robots remains largely unexplored. Our research focuses on implementing a natural language interpreter for commands and queries given to a small mobile robot. Our goal is to implement a complete system for natural language understanding in this domain, and as such consists of two main parts: a system for parsing the subset of English our robot is to understand and a semantic analyzer used to extract meaning from the natural language. By using such a system we will be able to demonstrate that a mobile robot is capable of understanding NL commands and queries and responding to them appropriately. 1

Upload: others

Post on 19-Feb-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

Natural Language Interaction with Robots

Alden Walker

May 7, 2007

Abstract

Natural language communication with robots has obvious uses in almost all areasof life. Computer-based natural language interaction is an active area of research inComputational Linguistics and AI. While there have been several NL systems built forspecific computer applications, NL interaction with robots remains largely unexplored.Our research focuses on implementing a natural language interpreter for commandsand queries given to a small mobile robot. Our goal is to implement a complete systemfor natural language understanding in this domain, and as such consists of two mainparts: a system for parsing the subset of English our robot is to understand and asemantic analyzer used to extract meaning from the natural language. By using such asystem we will be able to demonstrate that a mobile robot is capable of understandingNL commands and queries and responding to them appropriately.

1

Contents

1 Introduction 4

2 Overview of the robot and its language capabilities 52.1 The Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Natural Restrictions on Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Myro (Python Module) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Natural Language Processing Unit 83.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Grammar and Lexicon (and Parsing) . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 General Language Model and Robot Control Architecture . . . . . . . . . . . . . . . 10

3.3.1 Language Model (Imperatives) . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.2 Language Model (Queries and Declaratives) . . . . . . . . . . . . . . . . . . . 143.3.3 Model of Thought (Subsumptional Brain Architecture) . . . . . . . . . . . . 143.3.4 Robot Control Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4.1 Semantic Analysis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4.2 Imperatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4.3 Contextual Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4.4 (Embedded) Declaratives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4.5 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Examples 214.1 Example: Simple Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Example: Adjectives, Adverbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3 Example: Simple Prepositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4 Example: Contextual Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.5 Example: Embedded Declaratives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6 Example: Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.7 Example: Complete Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Progress and Future Work 24

6 Related Work and Background 24

A Grammar 27

2

B Acceptable Sentences 29B.1 Imperatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

B.1.1 Moving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29B.1.2 Turning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29B.1.3 Lights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30B.1.4 Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

B.2 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30B.3 Contextual Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3

1 Introduction

When giving a robot commands, a user typically must give short, blunt commands or re-member a long list of precise phrases that must be input exactly as expected. Usually, robotseither do not take commands in textual form (they have a joystick, for instance), or take a setof commands which are pre-programmed and must be said exactly as pre-programmed. Forinstance, a robot might know what “Move quickly” means but have no idea what“Quicklymove” means. It might know “Turn left for 45 degrees” and “Turn left slowly” but not “Turnleft slowly for 45 degrees.” These limitations arise because making a robot understand com-mands in a looser, more natural way is much more complicated that simply mapping a listof commands to a list of movements and activity scripts. Finding an appropriate responseto natural language is more in the domain of AI and natural language processing, and it isunderstandable that people who build robots tend to err on the side of dependability andrequire precise commands.

Coordinating both the basic functioning of the robot and natural language processingand understanding at the same time poses a considerable task. Both are very sensitiveand prone to error. Our goal is to use a very simple, pre-made robot and build a naturallanguage processor on top of it which can handle the small subset of English which makessense for commands and queries. By limiting the problem in such a way, we hope not onlyto end up with a functioning example of a robot able to follow commands given in naturallanguage but also to develop methods which can be used when creating similar systems formore complicated robots which allow for more complicated interactions with the real world.

In order to satisfactorily solve this problem, we must attempt to bridge gaps betweenrobotics and natural language processing. Natural language processing is typically done ona computer, which processes the input sentence, extracts meaning and represents it in aformalism, and responds appropriately in natural language. With a robot, we must producea real action. Research in natural language processing can show us ways to understandnatural language on a computer. Research in robotics can show us some of the best ways tohelp robots function in the real world. Transforming loose, natural language commands intothe precise low-level commands for the robot involves new kinds of semantic interpretation,and that is the goal of this project.

By using a combination of techniques from robotics and natural language processing, wehave developed an architecture which looks promising.

Here is an example of an interaction with our robot using the current version of ournatural language processor:

Hello, I’m Scribby!

Type a command: do you see a wall

4

No

Type a command: beep whenever you see a wall

Type a command: turn right whenever you see a wall to your left

Type a command: turn left whenever you see a wall to your right

Type a command: move for 60 seconds

The result of this sequence of commands is that the robot will move around for a minuteavoiding walls and beeping whenever it sees a wall. As is clearly demonstrated above, therobot can respond appropriately to many natural language commands.

2 Overview of the robot and its language capabilities

The natural language robot interface is composed of many pieces. However, there are naturalchunks into which the program pieces divide. In our system, all of the language processing isdone on a host computer. The host computer communicates with the robot over a wirelessbluetooth connection. All the robot control software also runs on the host computer. Com-munication takes places using a client-server paradigm. Thus, the robot runs only a smallserver which listens for commands. The basic design schematic is shown in Figure 1.

In the following sections, the various pieces which make up the lower-level system detailswill be explored in detail. This is necessary to understand the decisions we made and howwe dealt with the natural language issues which arose during research.

2.1 The Robot

We use the Parallax Scribbler robot, which is actually marketed more or less as a toy: it cancarry a Sharpie and draw lines on a piece of paper and avoid obstacles straight out of the box(see Figure 2). Connecting it to a computer through a serial port allows the microcontrollerto be reprogrammed as a robot typically would be.

All of our work is done in the context of the Scribbler but our work has broad applications.In order to solve the problem of natural language interaction, the architecture and designhave been tailored for the Scribbler. Thus, it is necessary to understand the capabilities ofthe robot to understand our design choices.

5

Commands Queries Responses

NLPU

Myro

Serial (Bluetooth) Connection

Scribbler Robot

IR Sensors

Light Sensors

Line Sensors (below)

Speaker

3 LEDsWheels

Software

Low-Level Hardware/Software Interface Schematic

Figure 1: The hardware interface

Figure 2: The Scribbler robot

6

2.1.1 Sensors

The robot has three sets of sensors: Proximity (IR), light, and line:

IR: The IR sensors consist of two IR emitters and one IR detector. The emitters shine IRlight forward, one slightly left and the other slightly right. Thus, though the robot can onlysee forward, it can detect if an object is a little to the left or right. In this way, it can turnin the correct direction to avoid obstacles and perform other similar actions. The commanddo you see a wall to your left would be translated and understood as a procedure forpolling the IR sensors.

Light: The robot has three forward-looking light sensors pointing left, straight, and right.Lower reported values from the light sensor correspond to brighter light. By turning towardthe light sensor reporting the lowest value, the robot can follow a bright light. Thoughnot always completely reliable, the light sensors do typically report different values if theincoming light to the sensors is significantly different, so the sensor looking at the brightestlight will usually report a lower value than the others.

Line: The line sensor is actually a pair of IR emitter/detector pairs underneath the robotlooking straight down. They are placed side-by side, so the robot can see if it is on the edgeof a dark area (the right sensor will report white and the other black). By moving slowlyand making careful use of these sensors, the robot can limit its movement to a box of a colorsignificantly different than the surrounding area or follow a dark line on a white background.

2.1.2 Output

The robot has three ways of producing output which interacts with the world: it can movearound, it can play sounds, and it can turn a bank of three LEDs on and off:

Movement: The robot has two wheels which can be controlled separately. Each can spinforward or backward at a given speed. Because the robot is round and has two wheels con-trolled so freely, it is very mobile and can spin in place or perform just about any sort ofmovement. The command move to the wall would have meaning performed by a proce-dure which polled the IR sensors and then directed the wheels accordingly.

Sound generation: The robot has a speaker which can play a wide range of frequenciesor pair of overlapped frequencies.

7

Lights: There are three LEDs which can be turned on and off. They are small, but usefulfor indicating status.

2.2 Natural Restrictions on Language

Because of the simplicity of the robot the set of natural language sentences/commands whichcould appear in an interaction between a human and the robot is restricted. The small setof ways in which the robot can affect the world (output: movement, sound, and lights) keepssmall the set of verbs which make sense in commands to the robot, and likewise the smallset of sensors makes small the set of possible queries. This restriction is very good for ourproject: in order to make possible natural language interaction between a robot and a humanthe natural language domain must be restricted, but good interaction will never occur unlessthe restriction is natural. By using such a simple robot we achieve this.

2.3 Myro (Python Module)

Myro is a high-level interface module for Python created by IPRE. Myro handles the serialconnection and allows a program written in Python to communicate easily without botheringwith the low-level details. We used this module as our starting base and built our systemon top of it. Thus, our NLPU communicates with the robot exclusively through the Myromodule.

3 Natural Language Processing Unit

3.1 Overview

The system for natural language understanding can be divided into three parts: the grammar,which performs syntactic analysis using a context-free grammar, the semantic analyzer, andthe brain engine, which organizes currently running commands into our model of thought.

The order of this section may seem strange: we will first discuss how we parse the inputprovided by a user. Then we will talk about how we represent meaning in our languagedomain. Only after that will we be ready to explore semantic analysis. Why is this? Thegoal of our project boils down to doing semantic analysis on the parse trees provided by ourparser. It doesn’t make sense to talk about semantic analysis until we know what outputwe should produce. In order to answer this question, we will take a detour into our methodfor representing the meaning of language and specifically imperative commands, how these

8

Commands Queries Responses

NLPU

Grammar/Parser

Semantic Analyzer

NLTK-lite

Input

Language Interface

Parse Tree

Layer

Raw Myro Commands

Brain ControlInterface

}Myro/Hardware Connection

Scribbler

move to the wallmove againturn right for 2 seconds

Figure 3: The NLPU

9

representations are realized in practice, and how the robot control architecture interacts withthe robot. At that point, we will be ready to bridge the gap with semantic analysis.

3.2 Grammar and Lexicon (and Parsing)

The small mobile robot that we elected to work with made the design of a grammar easier.Only a small subset of English is applicable to the situation of giving commands and queriesto such a robot, so the language is naturally restricted. Within this subset, however, wetried to be as complete as possible in the design of the grammar. The grammar parsesstandard declarative sentences and commands (verb phrases) and allows for adverbial andprepositional phrases.

The parsing is done by an NLTK-lite parse class. Though it’s not very fast, our parser usesa standard recursive descent algorithm. The danger with a shift-reduce parsing algorithm isthe possibility of not finding valid parses: that would be unacceptable.

Because the parsing is done in such a compartmented fashion, it is useful to let thegrammar contain some semantic information and tailor it to the specific situation. Weevaluate the meaning of a declarative sentence in the usual way i.e. as a logical statement, butcommands and contextual commands have meaning consisting a series of Myro commands.Thus, sentences get evaluated in completely different ways depending on whether they aredeclarative or imperative. The grammar can assist in the discrimination between types ofsentences: it immediately distinguishes a contextual command by finding the presence of acontextual adverb such as “again” or “more,” and the grammar considers declarative verbphrases and imperative verb phrases to be completely different. This allows the semanticanalysis to be carried out more effectively. An example of this is shown in Figure 4.

Note how the prepositional phrase is marked as a VPP, or verb prepositional phrase and isnoted at the top as being an IP - imperative sentence. In contrast, the contextual commandmove again is marked as a CC - contextual command. Thus, some of the semantic analysiscomes directly out of the syntax of the command. For the complete grammar, refer toappendix A.

3.3 General Language Model and Robot Control Architecture

We have now covered the way in which we do syntactic analysis. Now we turn to the way inwhich we represent meaning and how we realize these representations in practice. We willshow how we create a fundamental unit of meaning, called a block, and how we build themeaning of a command, a layer, out of these pieces. Once we have shown how to createlayers, we will describe our model of thought and how the layers are combined in a dynamicdata structure called a brain to create output.

10

C

IS

VP

V

turn

AdvP

Adv

quickly

AdvP

Adv

left

AdvP

VPP

P

for

DP

NP

NumP

Num

5

NP

N

seconds

C

CC

V

move

CAV

again

Figure 4: Parse trees

3.3.1 Language Model (Imperatives)

In order to understand how the grammar was designed, and especially how the semanticanalysis works, it is necessary to understand the reasoning behind our representation ofmeaning. The best way to see why we use the representation we do is to describe it andshow its flexibility. The reasoning described here applies only to imperatives.

We use a verb-centric system of meaning. We consider a verb to have a certain funda-mental meaning which is realized as a function. This verb function is is the meaning of theverb, and it, when called, generates a compact version of a Myro command performing theappropriate action. Other pieces of an imperative, such as adverbs or verbal prepositionalphrases, can modify the function. The function can take arguments as necessary: a functionrepresenting a transitive verb, for instance, needs an argument representing the object whichis acted upon. Our final meaning for an imperative phrase is the original verb function, mod-ified by the other pieces of the clause. If necessary, the verb function could have access tothe general state of the robot, such as whether there is a wall in front of it.

An important question is how the various phrases modify the meaning (function) of theverb. We chose to make the meaning, as far as possible, constrainted. That is, “move”has a fundamental meaning, and the prepositional phrases “for 2 seconds” simply serves toinhibit the behavior of the verb function so that after 2 seconds it ceases to have meaning.An imperative like “beep whenever you see a wall” contains the prepositional phrase “when-ever...,” which inhibits the verb function until the robot “sees” a wall, at which point theverb function is left free to act (and produce a beep).

11

For 2 seconds Turn right

nextBlock nextBlock

Senses/Time/Commands In

Commands Out

“Turn right for two seconds”

Figure 5: The block diagram for turn right for 2 seconds

This representation of meaning is useful and serves to make the meaning of multipleprepositional phrases clearer: “beep for 2 seconds whenever you see a wall” has two stackedprepositional phrases: when a wall is in sight, the outside blocking preposition “whenever”releases the command under it, which happens to be “beep for 2 seconds.” The secondprepositional phrases inhibits the verb function for “beep” so that the activity occurs for theduration of 2 seconds.

We call the fundamental unit of meaning corresponding to a verb or prepositional phrasea block. As described above, blocks can be linked together in such a way that one blockinhibits the behavior of another. In our example, the block representation for the commandturn right for 2 seconds is shown in Figure 5.

This inhibitory theory works well for prepositional phrases, but adverbs are a morecomplicated situation: prepositions tend to have meaning more disjoint from the verbs theymodify. However, the meaning of adverbs is tied up in the meaning of the verb it “modifies.”It is almost as though adverbs trigger a modification rather than create one. To see thedistinction, consider the two sentences “turn around” and “move around,” or “move back”and “throw it back.” The way in which the adverbs modify the meaning is strikingly different:turning around is probably the rather rigidly defined act of turning about 180 degrees, whilemoving around is vaguely defined as walking in a room in various directions, probably ina relaxed manner. Moving back is moving backwards, while throwing something back isprobably actually throwing it forwards to the person who threw it last. This last example

12

is weak because the verbs “move” and “throw” are intransitive and transitive, but the pointis clear.

In light of the complications with adverbs, we consider the meaning of adverbs to beinherent in the verb. Certainly, there are “default” adverbs, that come pre-programmed, eveninto newly made verbs, so the real-world situation is more complicated. But in our robot-world, adverbs trigger already-defined modifications in the verb functions. Thus, adverbsget bound up with the verb in the verb block: the block representation for the meaningof turn right quickly for 2 seconds would be identical to the block diagram above,except the inhibited verb block would be the block for turn right quickly. Note that wealready implicitly did this before with right without calling attention to it.

So far, we have verbs, adverbs, and prepositions. The main area left to cover is nounsand noun phrases. For imperative sentences, nouns and noun phrases tend to have an actualtangible meaning. Obviously, this is not the case in all sentences, but for imperatives a nouncarries with it contextual information which allows people to know how to pick it up, turn iton, move it, etc. Nouns might also represent an abstract quantity (a second, for instance),but even these abstract nouns must have some sort of realization, like the duration of anaction. Because of the simplicity of the world and range of activities for the robot, we choseto understand a noun as an abstract object. There can be many things associated with anoun object, such as its name or names, attributes that it has, verbs (functions) which canbe used to determine its state, command fields that will alter it, etc. When we attemptto find the meaning of a noun phrase, we are searching for a single noun object or a set ofobjects. Note that an object might be abstract, such as a second. This “second” won’t havemany attributes, and certainly no function to check its state or modify it, but it will have theabstract quantity of duration. The abstract noun “three” has basically only the cardinalityexpression of 3. These abstract quantities can get added together to create “three seconds.”

We actually consider adjectives and nouns to be more or less on equal footing. Whenwe ask for the left LED, we mean to perform an intersection of the set of objects which arehave the attribute “LED” and the set of objects which are “left.” The intersection will leaveus with the single object of the left LED. When we ask for “lights,” we will get everythingwhich is a light. That is, we’ll get a list of objects. The same thing applies to prepositionsinside noun phrases, which are basically just fancied up adjectives.

For the most part, nouns (and therefore adjectives) get bound up in fundamental meaning.To see why this is justified, consider the examples “play an A” and “play a B.” The actualphysical, imperative meanings of the commands are very different, even though the differenceis not in the verb. Here we can see that we need to take nouns directly into the block of averb or prepositional phrase.

It is clear now that we end up sticking many things inside these so-called “fundamental”blocks, but this comes only out of necessity. In support of this representation, it should be

13

noted that though prepositional phrases are only one syntactic/semantic category that wemust consider, they are very prevalent in imperatives. Also, the nouns and adverbs thatget stuck inside blocks are simple to handle, and once their meaning is understood we canbe done constructing our blocks. Compare this to the prepositional phrases which naturallyseem to regulate and whose meaning can change with time or the situation. With this inmind, it is clear that such a block structure is useful to use even if it only makes two typesof phrases “fundamental.”

It should be noted that some commands, such as move, have an unspecified ending condi-tion, so they will technically continue forever. In our language model, there is an understoodtemporal restriction which kicks in if the command does not provide some condition underwhich it stops. This makes sense in general: if someone is told to “turn,” they will turnfor a little while, maybe 180 degrees, and then stop, and likewise for such imperatives as“move” and “beep.” In addition to being probably more realistic, this addition to the modelsimplifies the picture, and it helps keep the robot under control!

3.3.2 Language Model (Queries and Declaratives)

Queries and declarative sentences have a similar structure. We think of both as logicalpropositions: declarative sentences (our robot can understand only declaratives which areembedded with a subordinating preposition) are simple propositions; queries are just propo-sitions modified to make them questions. In both cases, the meaning of the sentence is itslogical truth value. Here we are simply taking the typical linguistic stance about declarativesentences. However, queries need an extra note: a query is asking for a response from therobot, and we can think of that as an action, so the meaning of a query (a response toit) is a block which tells the brain to print a message to the computer screen—it’s still ablock. We do not have to worry about this with declaratives because they are embedded inprepositional phrases and thus get integrated into blocks automatically.

3.3.3 Model of Thought (Subsumptional Brain Architecture)

We have seen how we can represent the meaning of a command by breaking it down intoblocks inhibiting each other. Once we have this representation, what does it mean forthe robot to carry out the meaning? This is an important linguistic and cognitive question,because it involves not only how the robot will carry out a single command (it will just followorders) but also how the robot will deal with many simultaneous commands. We know nowwhat the meanings of move and turn right when you see a wall are. However, what isthe meaning of both of them together? This section attempts to answer that question.

14

We need some “container” representation to make the discussion of collections of com-mands easier: we define a layer to be the collection of interconnected blocks which representthe meaning of a command. We consider a layer as the representation of a single, self-contained command. For example, consider the command: turn right for two seconds.Verbs, together with adverbs, form single units of meaning, that is, blocks. Thus, we willcreate one block for “turn right.” We will create another block for “for two seconds,” and wewill give the verb block to the prepositional block as the block to inhibit. These two blocksare wrapped up in a single layer, and this layer is the meaning of the command.

The representation of the entire robot thought process is the brain. It consists of acollection of active layers, ordered by precedence. In our case, we consider older commandsto have higher precedence. The brain has the same input and output as a single layer, namelythe senses and state of the robot and a collection of Myro commands respectively. However,the brain synthesizes its whole collection of layers to generate just a few commands. Notethat the brain and each layer can generate multiple commands because there are three outputpaths (lights, sound, and movement), and each layer can use any of them. The method forthis is as follows: the brain goes through each layer and asks it what it wants to do giventhe current situation. It then asks the next one, and so on. The highest-precedence responsefor each of the three output paths is the one taken as the generated command.

In addition to this, we allow the brain to let the currently working layer have access towhat the previous layer wanted to do. In this way, layers can be “nice” and choose to allowother, less powerful layers to have a say. For instance, if one layer says to turn right andanother to move forward, the higher precedence one would be nice to simply combine bothdesires into forward movement with a slight right slant. In some other cases it is best to havehigher layers simply rule out their lower counterparts. It depends on the context. However,this layering effect is very powerful. As an example of what it can achieve, consider the seriesof commands (layers), ordered in decreasing precedence.

1. turn right when you see a wall to your left

2. turn left when you see a wall only to your right

3. move forward

When no wall is in sight, the robot will just move forward. When a wall is detected, the morepowerful layers will kick in and force the robot to turn in a direction appropriate to avoidthe wall. The result of these commands is that the robot will move around, avoiding thewalls. This is a rather complicated behavior, but stacking the various pieces of the activityallows it to be broken into simple chunks.

15

This layer stacking effect is our attempt to incorporate the successful “subsumptionalarchitecture” described in [1]. Because we incorporate new layers on the fly and do incorpo-rate representations into our design, the motivation behind implementing the subsumptionarchitecture is a little different, but it works well.

3.3.4 Robot Control Interface

Note that the brain never actually sends commands to the robot. The brain is our represen-tation of an entire collection of layers operating in concert. When we want to actually usethe commands that the brain produces, we can use our brain control interface to do this.

The robot command interface is simple. The proposed interface has two threads, one ofwhich receives input commands, parses, and analyzes them (the Language Interface), andthe other which manages the running brain and passes commands between the brain andthe robot (the Brain Control Interface). Because of technical issues, this interface is notfinished. The proposed interface differs from the current one most importantly in the factthat the current interface waits for a command to finish before asking for another one. Inthe proposed interface, things will happen in parallel so that effective layer stacking can takeplace. Currently, the interface performs a few rather mundane tasks. It runs an endless loopof:

1. Wait for a command

2. Parse and analyze it — if the command is a query, ask the brain for an appropriateresponse

3. Add the command to list of previous commands (context)

4. Add the resulting layer to the brain

5. Run the brain until it stops doing anything

This interface is weak because it does not allow commands to be entered while the robot isdoing anything (e.g.“Stop!”). Commands with unspecified ending conditions do get trun-cated, as discussed above, but it is still not ideal. However, it is effective.

3.4 Semantic Analysis

3.4.1 Semantic Analysis Overview

Now we know what the semantic analyzer gets as input: a parse tree containing somesemantic information. We also know what it must produce: a single layer. The method wedeveloped for doing this is as follows.

16

Once an NLTK-lite tree structure has been created out of an imperative clause usingthe NLTK-lite parser, our semantic analyzer is called. We tried to keep the analysis ascompartmented as possible, and in this vein, it proceeds in a recursive fashion: each nodein the tree has an associated function (getVP, for example). Our main analysis function,getMeaning, looks at the top node of the tree and calls the appropriate function. It passesas an argument a list of subtrees. The functions which process a given type of node relyon getMeaning to collect the meanings of the subtrees. The node function is primarilyresponsible for piecing things together.

We make heavy use of lambda functions during the analysis. Because verbs and preposi-tions must eventually make their way into the final layer as functions, node functions oftenbuild special functions by piecing the results of getting the meaning of subtrees.

We will look at each type of sentence/command in turn.

3.4.2 Imperatives

The entire workings of each function are too complicated and are unnecessary to go into.However, let’s take a look at our good example. Consider the imperative turn quickly

left for 5 seconds. The parse tree for this sentence is shown in Figure 4.We will go through the process of semantic analysis as our algorithm does; however, for

a broad overview, note that the call tree for the function getMeaning called on the abovecommand will be almost identical to the parse tree itself: every time a function needs to getthe meaning (in whatever form that is) of a subtree, it calls getMeaning, which in turn callsthe appropriate get command. Approximate pseudocode for any of the analysis functionswould be:

getMeaning(X)

1. Break the X into its constituent pieces (sub-projections) A, B, ...

2. Get the meanings of these pieces by calling getMeaning(A), getMeaning(B), ...

3. Assemble the meanings and return the meaning of X

The first interesting work is done by the function getVP. It will first create a block ofmeaning corresponding to turn quickly left. To do this, it will collect the adverbs andpass them to getV, which will take the simple function for turn and add the adverbs tocreate the block. The available verb functions come with pre-set adverbial modifications,which the adverbs just trigger.

Next, getVP will call getVPP to get the prepositional block. Here, we will find the DPseconds and add to it the numeral quantity of 5. This will create an object with a duration

17

of 5 seconds. This is collected with the preposition (using getP) to create the inhibitoryblock for for 5 seconds. If we had been required to parse a command like move to the

wall, we would have to have gotten the meaning of the noun wall. In order to make thispossible, we keep a database of all the known object with which the robot can interact. Thephrase right LED, for instance, will refer to the specific object in our database representingthe right LED on the robot. We find this meaning by giving each member of the database aseries of keywords. If we need to find the right LED, we first find all LEDs and then search inthis sublist for all things which are right. In the case of this command, we find the objectwhich represents a second and add the numeral quantity as described above.

getVP now takes the verb block and attaches it as the “next block” onto the prepositionalblock. Now, the verb block for turn quickly right is monitored and controlled by theprepositional block for for 5 seconds. This chain of blocks is packaged in a layer and isfinally returned as the result of the first call to getMeaning.

3.4.3 Contextual Commands

Contextual commands are handled differently: the interface saves the full parse tree of everyentered command along with the name of the verb and important characteristics of thatcommand. Thus, after a while, we’ll have a long history of parses and key words. Basically,contextual commands search through this list to find the previous command being referredto.

This may seem like a rather limited context from which to draw. However, it is sufficientfor most interactions with the robot. Consider the string of commands:

1. turn right

2. move to the wall

3. turn left for 3 seconds

4. move again

It is debatable what is actually meant by the last command. Is the prepositional phraseto the wall included in doing the action again? We say yes: the meaning of a contextualrepeat command is to repeat exactly some command which has already been done. ThegetMeaning function will recognize this last command as contextual from its parse, and itwill call the getCC function, which will search through the list of keywords from previouscommands until it finds the keyword “move.” It will then take the entire stored parse treeand just call getMeaning on it again and return the result.

18

Why wouldn’t we just store the layer resulting from the semantic analysis and notredo all the hard work? We must accommodate time-sensitive functions. When we say:move backwards for 3 seconds, we really mean move back until 3 seconds from now (atleast, that is the meaning we arrive at after semantic analysis). However, if we then saymove again, we certainly don’t mean to move until 3 seconds from the original time of thefirst command! Thus, commands sometimes have meaning which depends on the time ofthe command, and repeating a command necessarily moves the calling time to the present.To most easily accomplish this, we re-analyze the entire parse tree. Note that this is notactually that much of a loss: creating the parse tree is by far the most time-consuming partof the analysis process—re-analyzing is negligible compared to re-parsing (which we do notdo).

3.4.4 (Embedded) Declaratives

An important class of prepositions is the group of subordinating prepositions, which allowan embedded declarative sentence to appear inside a prepositional phrase, which in turngoverns the behavior of a command. For example, turn left if you see a wall. Herewe see the embedded declarative you see a wall. This is a special sort of prepositionalphrase, because it directly asks a question; regular prepositional phrases usually have aquestion or some sort of condition built in to them (move to the wall has the question “doyou see a wall” built in to it), but here we have an explicit condition on, for example, turningleft. What we need to do here (and for queries, as we will see) is evaluate the truth valueof the declarative in the current context of the robot. The meaning of the declarative, aswe discussed earlier, is its truth value: once the truth of the embedded declarative has beenevaluated, then prepositional phrase can use this truth value to govern its own inhibitorybehavior.

Our robot does not use a logic engine: that sort of system would be overkill. Thedeclaratives we need to understand are not adding facts to a knowledge base or anythingcomplicated like that; they are simply asking for the truth value of a simple sentence. In fact,the simplicity of the robot helps us dramatically restrict the domain of possible sentences. Ifwe are asking about the current state of the robot, we must ask about the value of one of itssensors, and in all cases, the only possible verb that can appear in the declarative is “see.” Forexample, turn if you see a wall asks about the IR sensors, turn if you see a light

asks about the light sensors, and move until you see a line asks about the line sensors.Therefore, the declarative analysis functions are looking only for sentences of the form “yousee X.” Obviously, this is an extremely simple subset of English. However, the restriction isappropriate here.

The semantic analysis is done by first building a tiny logical tree, so you see a wall

19

becomes (V: see (N: you) (DP: a wall)). Once this has taken place, a second set of functionslooks at the tree and builds a lambda function which returns true or false depending onwhether the robot (“you”) sees “a wall.” In our database of objects, each object has afunction associated with it which senses it, so we find the object “a wall,” perhaps usingadjectives and prepositions to pare down the list of possibilities, and call its sensor function.This process seems more complicated than is necessary, since as we’ve already said, the onlypossible verb is “see:” why don’t we just find the object of the verb and not bother with thetree? We’d like to be as general and flexible as possible; with the current system, we couldexpand it to understand more verbs if that started to make sense, or, more likely, we couldpass the logical tree to a logical engine for understanding.

Once the declarative semantic analysis has taken place, it returns the lambda function,which is then integrated into the preposition (inhibitory) block. In other words, the analy-sis of turn left if you see a wall to your right makes a lambda function out of theembedded declarative, so the command really becomes turn left if X, where X is ablack-box lambda function. Then the sentence is analyzed as an imperative, except getVPPuses the black box, in combination of how “if” deals with true and false, to create the verbpreposition layer.

3.4.5 Queries

Queries are just declaratives marked as needing some kind of language response (rather thanan action). We deal with queries in a way very similar to declaratives. Currently, we onlyhandle very simple “do” queries. This is not a language restriction imposed by the robot;it certainly makes sense to ask the robot “what do you see?” But that kind of query isquite complicated since it requires a search through all possible objects. We chose to onlyhandle the more simple type of query such as “do you see a wall?” Here, we just analyzethe embedded declarative. From our declarative analysis functions, we get a function whichgives us the truth value of the declarative in the context of the robot. We then build ablock (and layer) out of this function, but we use a special feature of our brain class: alayer can return the commands it wants for the motors, lights, and speaker, and it can passmeta-commands such as “kill me,” but it can also send messages for printing. Our query-analysis functions produce a layer which immediately kills itself and sends a message “yes”or “no” depending on the truth value of the declarative. In other words, we still producea layer from a query, but it doesn’t perform any action; it just sends a message. In thisway, we maintain the communication between the brain and the parser/analyzer: for everycommand, we produce a layer and add it to the brain. For imperatives, this works as alreadydescribed. For queries, the layer we add to the brain is special, but it’s still a layer. Keepin mind that the brain is not directly connected to the robot. It must be connected through

20

a data structure we created called a nerves interface, which passes commands to the actualrobot from the brain and relays the sensor data. To answer the query without adding a layer,we’d have to connect the analyzer directly to the nerves instantiation and bypass the brain.This goes against both our attempts to keep everything as compartmentalized as possibleand our desire for consistent representation of meaning.

4 Examples

Here we exhibit a set of example commands and the behaviors they induce in the robot.

4.1 Example: Simple Commands

Simple commands ask for a single, uncomplicated behavior. A selection of nice simplecommands:

• move —The robot moves forward for a small amount of time (about 2 seconds)

• turn —The robot turns a little

• beep —The robot beeps

• play a D —The robot plays a D (note).

4.2 Example: Adjectives, Adverbs

Things can get more interesting if we allow ourselves some adjectives and adverbs. All ofthe following commands have obvious results. Note that none of them specify the length oftime for which they should continue, so they all run for the default “small amount of time,”which is about 2 seconds.

• turn left

• move backwards

• turnon your left light

• quickly turn right

• quickly go back

21

4.3 Example: Simple Prepositions

Prepositions allow us to specify conditions under which an action should be performed. Thefollowing commands do what you would expect.

• move to the wall

• turn left to the wall

These simple prepositions do not allow us to do too much. We’ll see later that subordi-nating prepositions are much more powerful.

4.4 Example: Contextual Commands

With contextual commands, we can refer to previous commands. These commands aren’t toocomplicated, but they are interesting. Assume that we’ve already given the robot the com-mands (in order): move backwards, turn right, turn left, beep, move back quickly.Then we could give the following commands:

• again — The robot would move back quickly again.

• beep again — The robot would beep again (not terribly exciting)

• turn again — The robot would turn left.

4.5 Example: Embedded Declaratives

With subordinating prepositions, we can embed declarative sentences inside prepositionphrases, and these embedded declaratives now govern how the prepositional phrase inhibitsthe verb. The results of these commands are self-evident. Note that we no longer have thedefault 2-second duration, since these commands have explicit termination conditions.

We can now also start to make use of noun prepositions to specify which wall (right orleft), for instance, the robot sees.

• move until you see a wall

• turn left if you see a bright light

• beep whenever you see a light

• go back quickly if you see a line

• turn left quickly if you see a wall to your right

22

4.6 Example: Queries

Queries allow us to ask about the current state of the robot. The robot responds “Yes!” or“no” depending on the answer to the question.

• Do you see a wall

• Do you see a bright light

• Do you see a wall to your right

4.7 Example: Complete Interactions

Now we can put everything together. First, let’s use the layered, combining ability of thebrain to have the robot move around avoiding walls:

turn left if you see a wall to your right

turn right if you see a wall to your left

move for 45 seconds

The robot will move around for 45 seconds and avoid any walls that it sees.We can also use queries to, for instance, find a corner (“>>>” denotes commands, the

other lines are robot responses):

>>> do you see a wall

No

>>> move to a wall — the robot move forward until it sees a wall anywhere (right,left, or in front)

>>> turn left until you see a wall to your front — make sure the robot isdirectly facing the wall it just found

>>> do you see a wall — just checking

Yes!

turn left for 90 degrees — make the robot face parallel to the wall. We mightalso have done the command turn left until you can’t see a wall.

23

move until you see a wall

The robot should now we facing a wall and be parallel to the first wall it found, so it’sin a corner. In both of these examples, we can replace “wall” with “light” or “line” or amodified version of any of them.

5 Progress and Future Work

Probably the best way to understand the current situation is to look at the schematic inFigure 6.

The major incomplete area of the proposed NLPU is the robot control architecture.Python, the language in which our system is implemented, has support for multi-threading.However, there are technical issues with IDLE, a development environment for Python, whichhave stalled our efforts in this area. This is unfortunate, because with multi-threading canwe take full advantage of our layering, subsumptional architecture.

6 Related Work and Background

Our project is a blend of natural language processing and robotics, and there are a fewkey sources which inspired many of our design choices and made our project possible. Ona very basic level, we relied on all the research done to make the foundations of naturallanguage processing, like parsing, easy to do. In particular, we used the Natural LanguageToolkit (NLTK-Lite) for Python, which implements a recursive-descent parser and a tree datastructure. Using NLTK-Lite allowed us to focus our efforts on the design of the languagemodel and the semantic analysis.

The basic goal of our project is inspired most by research in natural language processing.In a very broad sense, this field attempts to develop software which can interact with a personusing natural language, and the products of this research are varied. One simple exampleis the machine which answers telephones and can direct calls based on spoken commandsfrom a user. This machine does mostly phonetic and syntactic processing: the challenge isunderstanding which words were spoken. On the other end of the natural language processingspectrum are programs which take typed natural language sentences, such as paragraphs ofa story, and can reason about what the meaning of the input is. These programs tacklethe problem of semantic and pragmatic analysis, and it is these latter programs which mostinspired our research.

In terms of semantic analysis, there are two distinct areas: research in more standard,linguistics-oriented efforts to create programs which can understand declarative sentences

24

Commands Queries Responses

NLPU

Grammar/Parser

Semantic Analyzer

NLTK-liteInput

Language Interface

Parse Tree

Layer

Raw Myro Commands

Brain ControlInterface

}Myro/Hardware Connection

Scribbler

Thread 1

Thread 2

}}

Figure 6: The proposed NLPU

25

and reason about them and research in programs which can carry out natural languagecommands, that is, procedural semantics. Clearly, our project falls more in line with thesecond area, but for basic reading, [5] is a good example of the first type of semantic analysisand demonstrates that software which understands a short story (among other things) wasquite feasible in the 70’s.

While this research is inspiring for us, we are interested more in creating a system whichworks with procedural semantics. A very important paper here is [7], from 1972, whichdevelops a system for logical deduction and understanding of commands which require com-plicated behaviors. This is a simulated robot arm program which can manipulate blocks.As an example of the power of this system, if a red block lies on top of a blue block, thesystem can figure out that if it is required to move the blue block, it will need to set the redblock aside. Though everything is simulated, this is a clear predecessor to the project weundertook and was our major source for clarification and understanding of the challenges wefaced.

For general background in procedural semantics and natural language processing in gen-eral, [4] and [2] were very useful. Following in the footsteps of Winograd are [6], [3].

The most influential research we looked to for reference was [1]. This paper argues thatthe standard way of building AI systems has the wrong approach. The vast majority ofresearch attempts to create an AI system in a simplified environment (all of the papers wehave discussed so far do this). For instance, the usual habitat of an AI robot is a blockworld: the things in the world with which the robot can interact are just blocks, perhapswith color or texture. The hope is that creating a good AI system in a simplified world canbe slowly extended to the creation of a good AI system in the real world. Brooks arguesthat we should instead start with simple robots in the real world: this paper describes thedevelopment of several robots which operate in the real world, performing relatively simpletasks, such as searching a building for soda cans to pick up and recycle. The key insight here isthe subsumptional architecture: one process directs the robot (the “move around” process)until the robot senses a situation requiring specialized behavior and a higher-precedenceprocess takes over (e.g. the “grab a soda can” process). The idea of having multiple levels ofprocesses which inhibit each other played directly into our design of the model of thought ofthe robot and our design of “block” as a linguistic idea (and an implemented data structure).Our realization of the ideas put forward by Brooks has an ironic twist: the very title of [1]contains the thesis that intelligent robots do not need to (and in fact are hindered by) afixed representation of the world. Note that we took the idea of a subsumptional model ofcontrol, made it dynamic, and added representations! The inhibitory model of proceduralsemantics that we inherited from this work made the most difference for us, and it, morethan anything else, made our project possible.

26

A Grammar

Here we show our complete grammar. Some abbreviations: C is “command” (the startsymbol). IS, CC, and DS are “imperative sentence,” ”contextual command,” and ”declarativesentence.” In addition, most standard phrase names are prefixed with a D to indicate thatthey are phrases in a declarative sentence.

C -> IS | CC | DS | Q

Q -> DQ | WH WHQ

WHQ -> DO DWHQ

DQ -> DO DS

DWHQ -> DS

DO -> do

WH -> what

CC -> V CAV | CAV

CAV -> again | more

DS -> DP DVP

DVP -> (AdvP | Adv) DV (AdvP) (VPP)

DV -> (AUX) RDV

DP -> NP | D NP

IS -> VP

VP -> (Adv) V (AdvP) (VDP) (VPP)

VDP -> (D) VNP

VNP -> VN | VN AP | AP VN

NP -> N | AP NP | N NPP | AP NP NPP | NumP NP | N AP

AP -> A

AdvP -> Adv AdvP |

NPP -> P DP

VPP -> P DP | SubP DS | P DP VPP | SubP DS VPP

NumP -> Num

Num -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |

20 | 45 | 30 | 90 | 180 | 360 | 270 | one |

27

two | three | four | five | six | seven | eight |

nine | ten

Adv -> quickly | slowly | carefully | right | left |

straight | back | backwards | forward | forwards |

around

VN -> light | LED | LEDs | sound | beep | song | note |

A | B | C | D | E | F | G

N -> I | you | little | room | dark | lights | yourself |

wall | light | box | line | seconds | second | foot |

feet | inches | meters | meter | degrees | right |

left | ahead | front

D -> the | your | a | an | that |

A -> right | left | center | high | low | black |

white | dark | bright | ahead | front

P -> to | until | for | along | toward | towards |

from | into

SubP -> after | if | unless | until | when | whenever

V -> face | spin | move | go | turn | turnon |

turnoff | play | light | blink | beep

AUX -> can | can’t | dont

RDV -> face | spin | move | go | turn | turnon |

turnoff | play | light | blink | beep | say |

hear | see | is

28

B Acceptable Sentences

Here is an example suite of sentences. This list is not exhaustive—most standard combina-tions of the phrases shown here are acceptable.

B.1 Imperatives

B.1.1 Moving

Here, go can be substituted for move, and light can almost always be substituted for wall:

move forward

move back

move backwards

move forward a little

move forward for two feet

move forward for 4 seconds

move forward slowly

move forward quickly

move forward slowly for two feet

move

move to the line

move to the wall

move forward to the wall

move until you can see the wall

move to the light

quickly move forward to the wall

move for 4 seconds whenever you see a wall to your right

B.1.2 Turning

turn right

turn left

turn right for 2 seconds

turn right for 90 degrees

turn right a little

turn right until you see a wall

turn right slowly

turn right quickly

29

slowly turn

turn slowly right

turn right until you can’t see a wall whenever you see a wall to your left

B.1.3 Lights

turnon your left light

turnon your lights

turn off your lights

turnon your right light when you see the wall

turnon your lights until you see the wall

turnon your left light if you see a wall to your left

B.1.4 Sound

play an A

beep when you see the wall

beep for 2 seconds when you can’t see a bright light

B.2 Queries

do you see a wall

do you see a bright light to your left

do you see a wall to you right

B.3 Contextual Commands

again

move again

play again

References

[1] Rodney A. Brooks. Intelligence without representation. Number 47 in Artificial Intelli-gence, pages 139–159. 1991.

[2] Colleen Crangle and Patrick C. Suppes. Language and Learning for Robots. CSLI Pub-lications, Stanford, CA, USA, 1994.

30

[3] Masoud Ghaffari, Souma Alhaj Ali, and Ernest L. Hall. A perception-based approachtoward robot control by natural language. Intelligent Engineering Systems through Arti-ficial Neural Networks, 14, 2004.

[4] Barbara J. Grosz, Karen Sparck-Jones, and Bonnie Lynn Webber, editors. Readingsin natural language processing. Morgan Kaufmann Publishers Inc., San Francisco, CA,USA, 1986.

[5] R. C. Schank and C. K. Riesbeck. Lisp. In R. C. Schank and C. K. Riesbeck, editors,Inside Computer Understanding: Five Programs Plus Miniatures, pages 41–74. Erlbaum,Hillsdale, NJ, 1981.

[6] Stuart C. Shaprio. Natural language competent robots. IEEE Intelligent Systems,21(4):76–77, 2006.

[7] Terry Winograd. A procedural model of language understanding. In Computation &intelligence: collected readings, pages 203–234. American Association for Artificial Intel-ligence, Menlo Park, CA, USA, 1995.

31