beyond chunking: learning in soar march 22, 2003 john e. laird shelley nason, andrew nuxoll and a...

Beyond Chunking: Learning in Soar March 22, 2003 John E. Laird Shelley Nason, Andrew Nuxoll and a cast of many others University of Michigan Research Methodology in Cognitive Architecture 1.Pick basic principles to guide development 2.Pick desired behavioral capabilities 3.Make design decisions consistent above 4.Build/modify architecture 5.Implement tasks 6.Evaluate performance Soar Basic Principle: Knowledge vs. Problem Search Knowledge Search Finds knowledge relevant to current situation Architectural not subject to change with new knowledge Not combinatorial or generative Problem Search Controlled by knowledge, arises from lack of knowledge Subject to improvement with additional knowledge Generative combinatorial Desired Behavioral Capabilities Interact with a complex world - limited uncertain sensing Respond quickly to changes in the world Use extensive knowledge Use methods appropriate for tasks Goal-driven Meta-level reasoning and planning Generate human-like behavior Coordinate behavior and communicate with others Learn from experience Integrate above capabilities across tasks Behavior generated with low computational expense Example Tasks TacAir-Soar & RWA-Soar Soar Quakebot Soar HauntbotSoar MOUTbotAmber EPIC-Soar NL-Soar The horse raced past the barn fell R1-Soar Production Memory Working Memory Soar 101 East South North Propose Operator Compare Operators North > East South > East North = South Apply Operator OutputInput Select Operator If cell in direction is not a wall, --> propose operator move If operator will move to a bonus food and operator will move to a normal food, --> operator > If an operator is selected to move --> create output move-direction Input Propose Operator Compare Operators Select Operator Apply Operator Output If operator will move to a empty cell --> operator < North > East South < move- direction North Soar 102: Subgoals East South North Propose Operator Compare Operators Apply Operator Output Input Select Operator Input Propose Operator Compare Operators Select Operator Tie Impasse Evaluate-operator (North) North = 10 Evaluate-operator (South) Evaluate-operator (East) = 10 = 5 Chunking creates rule that applies evaluate-operator North > East South > East North = South = 10 Chunking creates rules that create preferences based on what was tested Learning Results Decisions Score random look-ahead no chunk look-ahead during chunking look-ahead after chunking Soar 102: Dynamic Task Decomposition Achieve Proximity Employ Weapons Search Execute Tactic Scram Get Missile LAR Select Missile Get Steering Circle Sort Group Launch Missile Lock RadarLock IRFire-Missile Wait-for Missile-Clear If intercepting an enemy and the enemy is within range ROE are met then propose employ-weapons Employ Weapons If employing-weapons and missile has been selected and the enemy is in the steering circle and LAR has been achieved, then propose launch-missile Launch Missile If launching a missile and it is an IR missile and there is currently no IR lock then propose lock-IR Lock IR Execute Mission Fly-route Ground Attack Fly-Wing Intercept If instructed to intercept an enemy then propose intercept Intercept >250 goals, >600 operators, >8000 rules Chunking Simple architectural learning mechanism Automatically build rules that summarize/cache processing Converts deliberate reasoning/planning to reaction Problem search => knowledge search Problem solving in subgoals determines what is learned Supports deliberate/reflective learning Leads to many different types of learning strategies If reasoning is inductive, so is learning Why Beyond Chunking? Chunking requires deliberate processing (operators) to record experiences capture statistical regularities learn new concepts (data chunking) Processing for these is done only because we want the learning, not because it is performing a task Learning competes with task at hand Hard to implement, hard to use Are there other architectural learning mechanisms? Episodic Learning [Andrew Nuxoll] What is it? Not facts or procedures but memories of specific events Recording and recalling of experiences with the world Characteristics of Episodic Memory Autobiographical Not confused with original experience Runs forward in time Temporally annotated Why add to Soar architecture? Not appropriate as reflective learning Provides personal history and identity Memories that can aid future decision making & learning Can generalize and analyze when time and more knowledge are available Episodic Learning When is a memory recorded? Fixed period of time Significant event Significant change in highest activated working memory elements What are the cues for retrieval? Everything Only input Most activated input / everything Domain specific features Is retrieval automatic or deliberate? What is retrieved? Changes to input Changes to working memory Changes to activated How is the memory stored? As production rule Whats missing Sense of the time when episode occurred Current implementation is not task independent Episodic Recall Implementation East South North Propose Operator Compare Operators Apply Operator Output Input Select Operator Tie Impasse Evaluate-operator (North) North = 10 = 3 If a memory matches, it computes correct next state If no memory matches, returns default evaluation [3]. Two Approaches 1.On-line Build memories as actions taken Attempt to recall memories during look-ahead Chunk use of memories during look-ahead 2.Off-line Randomly explore while memories are recorded Off-line attempt to recall and learn from recorded memories Chunk use of memories during look-ahead On-line Episodic Learning On-Line Episodic Learning Off-Line Episodic Learning Reinforcement Learning [Shelley Nason] Why add it to Soar? Might capture statistical regularities automatically/architecturally Chunking can do this only via deliberate learning Why Soar? Potential to integrate RL with complex problem solver Quantifiers, hierarchy, How can RL fit into Soar? Learn rules that create numeric probabilistic preferences for operators Used only when symbolic preferences are inconclusive Decision based on all preferences that are recalled for an operator Why is this going to be cool? Dynamically compute Q-values based on all rules that match state Get transfer at different levels of generality Example Numeric Preferences East South North North = 8North =12North =15North =1North =2North =10 North = 48/6 = 8 = 8 Reinforcement Learning East = 6 South = 3 North = 11 North = 10 Create rule that creates numeric preference for North in state A using values in State B and max(proposed operators) according to standard RL State A State B Conditions of rule? > Current: all of state > Future: what was tested to produce evaluation of State B but existed in State A Reinforcement Learning Results Score Actions Random Learning 1 Learning 2 Learning 3 Learned Greedy Architectural Learning Automatic & ubiquitous Task independent & fixed Bounded processing Single experience-based Examples: Chunking Episodic learning Reinforcement learning Semantic/concept learning? Deliberately engaged On top of architecture Uses knowledge to control Uses architectural learning Can change with learning Unbounded processing Can generalize across multiple examples through recall Examples: Task acquisition Learning by instruction Learning by analogy Recovery from incorrect knowledge Deliberate/Reflective Learning Reflective Learning What is required to support reflective/deliberate learning? In Soar, impasses and subgoals are important What about Act-R 5.0? Seems that has declarative strategy? Way to make a decision at meta-level? Get access to memory indirectly? Syntactically complete Can learn anything that can represented

beyond chunking: learning in soar march 22, 2003 john e. laird shelley nason, andrew nuxoll and a...

Documents