author: james allen, nathanael chambers, etc. by: rex, linger, xiaoyi nov. 23, 2009

PLOW: A Collaborative Task Learning Agent

PLOW: A Collaborative Task Learning AgentAuthor: James Allen, Nathanael Chambers, etc.

By: Rex, Linger, XiaoyiNov. 23, 2009OutlineIntroductionThe PLOW SystemDemonstrationLearning TasksEvaluationStrength & WeaknessRelated WorksQ&A and DiscussionIntroductionAim to further human-machine interactionQuickly learn new tasks from human subjects using modest amounts of interactionAcquire task models from intuitive language-rich demonstrationBackgroundPrevious Work: Learn new tasks by observation, learn throw observing experts demonstration.Papers Contribution: Acquire tasks much more quickly, typically from a single example, maybe with some clarification dialogue.The Interface

Agent Architecture

Language ProcessingFocus on how it is used for task learning, rather than how it is accomplishedTRIPS systemBased on a domain-independent representationInstrumentationThe key issue is to get the right level of analysis for the instrumentation

DOMsA demoTask LearningChallengesIdentifying the correct parameterizationHierarchical structureIdentifying the boundaries of iterative loopsLoop termination conditionsTask goalsPrimitive Action LearningNL Interpretation + GUI InterpretationHeuristic search through DOMSemantic metricStructural distance metricPrimitive Action Learning

Natural LanguageInterpretationGUI InterpretationHeuristic searchStructural Distance MetricSemantic metricParameterization LearningIdentify appropriate parameterizationObject rolesInput/output parameterConstantRelational dependencyInformation from languageParameterization LearningOutput: HotelsInput: Address

Constant: HotelsRelational Dependency: Zip is Role of AddressHierarchical Structure LearningBeginning of new proceduresA GoalEnd of procedureIteration LearningIterative procedure learningLanguage supportPLOWs attempt for IterationUser corrections/more exampleRule/Pattern learned

Iteration Learning

Evaluation16 test subjects with general training3 other systems:One learned entirely from passive observationOne used a sophisticated GUI primarily designed for editing procedures but extended to allow the definition of new proceduresOne used an NL-like query and specification language that requires detailed knowledge of HTML producing the web page

Evaluation First partSubjects taught different systems about some subset of predefined test questionsEvaluators created new test examples by specifying values for input parameters, then scored the execution resultsPLOW scored 2.82 out of 4 (not mention other systems scores)

Evaluation Second part10 new test questions designed by an outside group, unknown to the developers prior to the testSubjects had one work day to teach whichever of these tasks they wished, using whichever of the task learning systemPLOW was used to teach 30 out of 55 task models

Evaluation Second part10 new test questions designed by an outside group, unknown to the developers prior to the testSubjects had one work day to teach whichever of these tasks they wished, using whichever of the task learning systemPLOW was used to teach 30 out of 55 task modelsAlso, PLOW received the highest average score in the test (2.2/4)

StrengthIntegrating natural language recognition and understanding (TRIPS, 2001)Play by play mode, great user experienceEasier to identify parameters, boundaries of loops, termination conditions, build hierarchical structure, realize goalsGeneralization from one short taskLearn not only the task, but also the ruleStrengthError correction from usersThis is wrong. Here is another titlePLOW will confirm correctness from users when generating data from listsLess domain knowledge required, less trainingClose to one-click automationWeaknessSome remarks of Evaluation:PLOW was ensured of being able to learn 17 pre-determined test questions, other systems?10 new tasks have different levels of difficulties: ex: For what reason did travel for between and ?No detailed analysis of evaluation result, so does PLOW really learn robust task models from a single example? Or just better on certain types of tasks?WeaknessLearning and reasoning relied on NL understanding:encounters new concepts?require certain patterns of speaking? Enough NL understanding capabilities?Still need one full work day to teach 3 simple tasks/personUsers have to construct good task models, no error detection mechanism for users in PLOWRelated worksSheepdog, 2004an implemented system for capturing, learning, and playing back technical support procedures on the Windows desktopComplex technical supporting procedures relatively simple procedures in PLOWRecord traces to form alignment problem, use I/O HMMs to build procedure models, need many training examplesRelated worksTailor, 2005a system that allows users to modify task information through instruction Recognize users instruction: combine rules with parser, JavaNLPMap sentences to hypothesized changesReason about the effects of changes, detect the unanticipated behaviorAlso relatively simple tasksRelated worksCHINLE, 2008a system which automatically constructs PBD systems for applications based on their interface specification Learning from incomplete data and partial learning from inconsistent data PLOW can learn subset of certain tasks, but users cannot make mistakesQuestions?More time for the latest PLOW demo?Thank you!

author: james allen, nathanael chambers, etc. by: rex, linger, xiaoyi nov. 23, 2009

Documents

new tasks

task learning systemplow

new test examples

task modelsalso

task models evaluation

part10 new test questions

work day

systems scores evaluation