meta-level incomplete information€¦ · the robot's history. however, the robot cannot make...

2
Meta-Level Incomplete Information Richard Goodwin and Reid Simmons {rich,reids} @cs.cmu.edu School of Computer Science Carnegie Mellon University Pittsburgh, PA, ('SA 15213-3890 Typically, when we discus s planning with incomplete in- fo rmation, we are referring to in co mplete information about the state of the world or the effects of actions. One common approach to dealing with this type of incomplete informa- tion is to use a probabi listie model to reason about the partial information that is available. Tu make reasoning with prob- abili st ic models tractable, assumptions and approximations are made that necessarily make the model an incomplete representation of the partial information available on the state of the world . In this extended abstract, I will discuss incomplete information concerning planning models of the world and the robot. Since this is incomplete information about models of incomplete information, I wiJl call it meta- level incomplete information . I will illustrate the so urces of this meta-level incomplete information and describe how a pianner ca n make use of this ty pe of incomplete informa- tion. I will use a partic uiar probabilistic model, a partially o bsenable Markov deci sion process, and examples from work with the Xavier robot to make the discussion more concrete. Numerou s probabilistic technique s for dealing with sen- sor and actuator noi se have been proposed and explored in the context of mobile robotics. One technique, that is s tarting to gain acceptance, is to li se partially observable Markov Decision Processes (POMDP), to model a robot' s interaction with the environment (Cassandra, Kaelbling, & Littman 1994). POMDPs provide a method of represent- ing uncertainty in the initial state, methods for representmg sensor and actuator noise and tor handling problem s like sensor alia sing. The approach has proaucea capable robot sy stem s for indoor navi ga tions ( Simmons & Koenig 1995). Inherent in the POMDP approach is the assumption that the real world obeys the Markov property and that the model captures all the relevant state information. There is also of- ten the assumption that the world is stationary and does not change over time . For tractability reasons, continuous valued variables are discretized. A robot's location can be discretized on a meter grid, which can violate the Mark ov assumption. To s ee why, co nsider a ro bot starting at the edge of a stair well (Fig ure I). In one case , the robot backs up 1.2 meters and in the other, the robot backs up 1.8 meters. For this example, let s assume that the robot s actuators are accurate to within O. J meters. In both cases, the robot end s 25 up in the same grid cell, J.O to 2 .0 meters from the edge . If the robot IS then commanded to go forward 1.5 meters , we can easily predict the results with certainty based on the robot's history. However, the robot cannot make the same prediction based solely on Its grid location . Using the Markov model, the robot wouJd predict that in both cases, there was a non-zero, non-certain probability that it would go over the edge. This is becau se a Markov model typi- cany includes state transitiuns that allow a 1.5 meter move to move the robot between zero and 2 cells forward , with gi ven probabilities. Suppose the robot got a po sitive reward for getting closer to the edge, but a negative reward for going o ver the edge. The robot could use the probabilities from the Markov model to calculate the expe cted utility to decide whether to move forward 1.5 meters or not. The decision process would be wrong because the underlying model does not justify the estimates u se to make the deci- sion. Information is needed about the accuracy of the model to conclude that the probability estimates are not valid and that the planner can not predict the future with any accuracy. Figure 1: The robot starts at the edge and moves back either 1.2 or 1.8 meter s. In both cases , it ends up in the same grid location, 1.0 to 2.0 meters from the edge. Any realistic model of a robot interacting with its envi- ronment wil.1 require a multitude of parameter and proba- bility estimates. The Markov mudel used to do position estimation for the Xavier robot (Simmons & Koenig 1995) needs estimates of corridor length s and probabilities for de- tecting features like open doors. To know the limits of From AAAI Technical Report SS-96-04. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved.

Upload: others

Post on 17-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Meta-Level Incomplete Information€¦ · the robot's history. However, the robot cannot make the same prediction based solely on Its grid location. Using the Markov model, the robot

Meta-Level Incomplete Information

Richard Goodwin and Reid Simmons {rich,reids} @cs.cmu.edu

School of Computer Science Carnegie Mellon University

Pittsburgh, PA, ('SA 15213-3890

Typically, when we discuss planning with incomplete in­fo rmation, we are referring to incomplete information about the state of the world or the effects of actions. One common approach to dealing with this type of incomplete informa­tion is to use a probabi listie model to reason about the partial information that is available. Tu make reasoning with prob­abili st ic models tractable, assumptions and approximations are made that necessarily make the model an incomplete representation of the partial information available on the state of the world . In this extended abstract, I will disc uss incomplete information concerning planning models of the world and the robot. Since this is incomplete information about models of incomplete information, I wiJl call it meta­level incomplete information . I will illustrate the so urces of this meta-level incomplete information and describe how a pianner can make use of this type of incomplete informa­tion. I will use a parti cuiar probabilistic model, a partially obsenable Markov deci si on process, and examples from work with the Xavier robot to make the di sc ussion more concrete.

Numerous probabilistic techniques for dealing with se n­sor and actuator noi se have been proposed and explored in the context of mobile robotics. One technique, that is starting to gain acceptance, is to li se partially observable Markov Decis ion Processes (POMDP), to model a robot's inte raction with the environment (Cassandra, Kaelbling, & Littman 1994). POMDPs provide a method of represent­ing uncertainty in the initial state, methods for representmg sensor and actuator noise and tor handling problems like sensor alias ing. The approach has proaucea capable robot systems for indoor navigations (Simmons & Koenig 1995).

Inherent in the POMDP approach is the assumption that the real world obeys the Markov property and that the model captures all the relevant state information. There is also of­ten the assumption that the world is stationary and does not change over time. For tractability reasons, continuous valued variables are discretized. A robot's location can be discretized on a meter grid, which can violate the Markov assumption. To see why, consider a ro bot starting at the edge of a stair well (Figure I). In one case, the robot backs up 1.2 meters and in the other, the robot backs up 1.8 meters. For this example, lets assume that the robot s actuators are accurate to within O. J meters. In both cases, the robot ends

25

up in the same grid cell, J.O to 2.0 meters from the edge. If the robot IS then commanded to go forward 1.5 meters , we can easily predict the results with certainty based on the robot's history. However, the robot cannot make the same prediction based solely on Its grid location . Using the Markov model, the robot wouJd predict that in both cases, there was a non-zero, non-certain probability that it would go over the edge. This is because a Markov model typi­cany includes state transitiuns that allow a 1.5 meter move to move the robot between zero and 2 cell s forward , with gi ven probabilities. Suppose the robot got a positive reward for getting closer to the edge, but a negative reward for going over the edge. The robot could use the probabilities from the Markov model to calculate the expected utility to decide whether to move forward 1.5 meters or not. The decision process would be wrong because the underlying model does not justify the estimates use to make the deci­sion. Information is needed about the accuracy of the model to conclude that the probability estimates are not valid and that the planner can not predict the future with any accuracy.

Figure 1: The robot starts at the edge and moves back either 1.2 or 1.8 meters. In both cases, it ends up in the same grid location, 1.0 to 2.0 meters from the edge.

Any realistic model of a robot interacting with its envi­ronment wil.1 require a multitude of parameter and proba­bility estimates. The Markov mudel used to do position estimation for the Xavier robot (Simmons & Koenig 1995) needs estimates of corridor lengths and probabilities for de­tecting features like open doors. To know the limits of

From AAAI Technical Report SS-96-04. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved.

Page 2: Meta-Level Incomplete Information€¦ · the robot's history. However, the robot cannot make the same prediction based solely on Its grid location. Using the Markov model, the robot

Sources of Incomplete Information Hardware Planning Models I Meta-level Planning Models

Planner Performance Model Sensor Noise Sensor Aliasing Actuator Impreci sion Limited Sens ing Range

Markov and Stationary Assumptions Model Parameter Error Model Abstraction Independence Assumptions

the model, we need to know the accuracy of each of these estim ates. This effectively doubles the size of the model creation problem. Fortunately, all is not lost. There are ways of mitigating the effec ts of model limitations and for learning model limits. One approach is to add sensors and low level reac tive routines to compensate for poor model prediction. Suppose the robot in the example above had a drop-off sensor and a low level routine to s top the robot when a drop off was detected. The planner could then safely predict that the probabi lity of e nding up in the grid cell over the edge was small. The robot would still need to know the limitations of the Markov model in order to decide to use a model of the low level behaviour for predicting the probability of each outcome rather than the Markov model.

Another method of reducing the burden of model cre­ation is to learn the model from the robot's experience in the world. Using standard techniques, learning methods can learn a model and estimate the poss ible error in the model as well. Adapting a standard learning technique, Sven Koenig has created a Markov model based position estimation sys­tem that s tarts with an approximate map and learns the lengths of corridors (Koenig, Goodwin, & Simmons 1995). The learned model can also be analyzed to extract model I imits in terms of the variance in corridor length estimates.

I mo ve on now to briefly disc uss meta-level planning and how incomplete information effects it. The meta-level planning problem is to determine how much pl anning to do before beginning execution and to tradeoff computation time for plan qual ity. Some of this tradeoff is done implicitl y in the creation of the planning model. For example, abs trac­tion c reates smaller models that take less time to evaluate, but may prod uce less accurate resu lts. Another approach is to use successive-approximation so lution techniques and allow a meta-level planner to dynamically tradeoff compu­tation for soiutlOn quality at run time. To do this tradeoff, the meta-level planner needs to estimate the expected cost of its c urrent plan and to model of the expected improvements from further planning. Again, to be efficient, these estimates must be generated approximately, us ing a s implified model. The anytime planning approach creates a performance curve and uses it to make predictions (Dean & Boddy 1988). In many cases, the error in these curves and the accuracy with which they model the expected performance are unknown.

Aga in , there are ways of mitigating the effects of model limits. In recent work o n usin g more detaIled models for get­ti ng better esti mates o f planner performance, I've found th at using more deta iled performance estimate model s is often not worth while. Doing meta-level control with very ap­proximate m ode ls o f planner performance tend s to improve

26

Performance Curves

the overalJ rerformance of the system. This is because the meta-level pl anning is very efficient and saves more computation than it uses. However, when more detailed model s are used to generate better estimates of the planners future performance, the system performance tends to de­grade. The extra computation needed for the more complex models outweIghs the performance gain due to making bet­ter meta-level decisions. To see why this might be the case, remember that the decision to begin execution or continue planning is binary. A small change in the estimates will only c hange the decision if the value was already close to the threshold. In cases where a decision is clear, and the estimates are far from the threshold, improvII1g the quality of the estimate will have no effect on the decision and just wastes computation (Russell & Wefald 1991). In the cases where the value is close to the thresh o ld, the cost of making an incorrect decision is relatively low, so, improving the estimate to make a better decision has relatively little value.

In summary, we need to recognize that a robot planner has to deal with incomplete information due to its modellimita­tions as wei I as incomplete information due to its limited and noisy sensors and actuators. In the table above, I summa­rize aspects of a rob ot system about which a planner might have incomplete informati on. The hardware sources of in­complete information are listed on the left. To handle these sources of inco mp] e te information , we build probabili s tic planning models. These models are themselves sources of incomplete information, listed in the center. Finally, if we do meta-level planning and use models for predicting plan­ner performance, these meta-level models are also sources of incomplete information.

References Cassandra, A.; Kaelbling, L.; and Littman , M. 1994. Act-ing optimally in partially observable stochastic domains. In Proceedings of the AAAI, 1023-1028.

Dean, T , and Boddy, M . 1988. An analysis of time depen­dent planning. In Proceedings AMI-88, 49-54. AAAl.

Koenig, S.; Goodwin , R.; and Simmons, R. 1995. Robot navigation with Markov models: A framework for path planning and learning with limited computational re­sources. In Proceedings 0/ the International Workshop on Reasoning with Uncertainty ill Robotics, Amsterdam, Th e Netherlands.

Russell, S., and Wdald, E 1991. Do the Right Thing. MIT Press.

Simmons, R , and Koenig, S. 1995. Proba bili stic robot navigation in partially observable environments. In Pro­ceediflg~ o/the ileAl, 1080-1087 .