soar one-hour tutorial

Soar One-hour TutorialJohn E. Laird

University of MichiganMarch 2009

http://sitemaker.umich.edu/soar laird@umich.edu

Supported in part by DARPA and ONR

Tutorial Outline1. Cognitive Architecture2. Soar History3. Overview of Soar4. Details of Basic Soar Processing and Syntax

– Internal decision cycle– Interaction with external environments– Subgoals and meta-reasoning– Chunking

5. Recent extensions to Soar– Reinforcement Learning– Semantic Memory– Episodic Memory– Visual Imagery

Learning

How can we build a human-level AI?

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving

Talking on cell phone

Learning

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving

Programs

Computer Architecture

Logic Circuits

Electrical circuits

Learning

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving

Programs

Computer Architecture

Logic Circuits

Electrical circuits

Symbolic Long-Term Memories

Procedural

Symbolic Short-Term Memory

Decision Procedure

ChunkingReinforcementLearning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception Action

Imagery

CognitiveArchitecture

Cognitive Architecture

Fixed mechanisms underlying cognition– Memories, processing elements, control, interfaces– Representations of knowledge– Separation of fixed processes and variable knowledge– Complex behavior arises from composition of simple

primitivesPurpose:

– Bring knowledge to bear to select actions to achieve goals

Not just a framework – BDI, NN, logic & probability, rule-based systems

Important constraints:– Continual performance– Real-time performance– Incremental, on-line learning

Architecture

Knowledge Goals

Task Environment

Common Structures of manyCognitive Architectures

Short-term Memory

Procedural Long-term Memory

Declarative Long-term Memory

Perception Action

ActionSelection

Procedure Learning

Declarative Learning

Different Goals of Cognitive Architecture

• Biological plausibility: Does the architecture correspond to what we know about the brain?

• Psychological plausibility: Does the architecture capture the details of human performance in a wide range of cognitive tasks?

• Functionality: Does the architecture explain how humans achieve their high level of intellectual function? – Building Human-level AI

Short History of Soar

1980 19951985 1990 2000 2005

Pre-SoarProblem SpacesProduction SystemsHeuristic Search

Functionality

Modeling

Multi-method Multi-task problem solvingSubgoalingChunking

UTCNatural LanguageHCIExternal Environment

IntegrationLarge bodies of knowledgeTeamworkReal Application

Virtual AgentsLearning from Experience, Observation, Instruction

New Capabilities

Distinctive Features of Soar• Emphasis on functionality

– Take engineering, scaling issues seriously – Interfaces to real world systems– Can build very large systems in Soar that exist for a long time

• Integration with perception and action– Mental imagery and spatial reasoning

• Integrates reaction, deliberation, meta-reasoning– Dynamically switching between them

• Integrated learning – Chunking, reinforcement learning, episodic & semantic

• Useful in cognitive modeling– Expanding this is emphasis of many current projects

• Easy to integrate with other systems & environments– SML efficiently supports many languages, inter-process

System ArchitectureSoar Kernel

KernelSML

ClientSML

SWIG LanguageLayer

Application

Soar 9.0 Kernel (C)

Higher-level Interface (C++)

Encodes/Decodes function calls and responses in XML (C++)

Soar Markup Language

Encodes/Decodes function calls and responses in XML (C++)

Wrapper for Java/Tcl (Not needed if app is in C++)

Application (any language)

Soar Basics

• Operators: Deliberate changes to internal/external state • Activity is a series of operators controlled by knowledge:

1. Input from environment2. Elaborate current situation: parallel rules3. Propose and evaluate operators via preferences: parallel rules4. Select operator5. Apply operator: Modify internal data structures: parallel rules6. Output to motor system

Agent in real or virtual world

Agent in new state

Operator

Basic Soar Architecture

Long-Term MemoryProcedural

Symbolic Short-Term MemoryDecision

Procedure

Chunking

Perception Action

ElaborateOperator OutputInput

Elaborate State

Propose Operators

Evaluate Operators

Select Operator Apply Operator

ApplyDecide

EvaluateOperatorsEvaluate

Operators

ProductionMemory

WorkingMemory

Soar 101: Eaters

SouthNorth

ProposeOperator

North > EastSouth > East

North = South

ApplyOperator OutputInput Select

Operator

If cell in direction <d> is not a wall, --> propose operator move <d>

If operator <o1> will move to a bonus food and operator <o2> will move to a normal food, --> operator <o1> > <o2>

If an operator is selected to move <d>--> create outputmove-direction <d>

Input ProposeOperator

SelectOperator

ApplyOperator Output

If operator <o1> will move to a empty cell--> operator <o1> <

North > EastSouth <

move-direction North

Example Working Memory

BA (s1 ^block b1 ^block b2 ^table t1)

(b1 ^color blue ^name A ôntop b2 ^size 1 ^type block ^weight 14)(b2 ^color yellow ^name B ôntop t1 ^size 1 ^type block ûnder b1 ^weight 14)(t1 ^color gray ^shape square ^type table ûnder b2)

Working memory is a graph.All working memory elements must be “linked” directly or indirectly to a state.

^block

^table

yellow

^color

^weight

^under

^ontop

Soar Processing Cycle

Elaborate State

Propose Operators

Evaluate Operators

ApplyDecide

Rules Impasse

Subgoal

Elaborate State

Propose Operators

Evaluate Operators

ApplyDecide

TankSoar

Red Tank’s Shield

Borders (stone)

Walls (trees)

Health charger

Missile pack

Blue tank (Ouch!)

Energy charger

Green tank’s radar

Soar 103: Subgoals

ProposeOperator

CompareOperators

OperatorInput ProposeOperator

CompareOperators

SelectOperator

Wander

If enemy not sensed, then wander

ApplyOperator Output

Soar 103: Subgoals

ProposeOperator

CompareOperators

Operator

Attack

If enemy is sensed, then attack

TacAir-Soar [1997]

Controls simulated aircraft in real-time training exercises (>3000 entities)

Flies all U.S. air missions

Dynamically changes missions as appropriate

Communicates and coordinates with computer and human controlled planes

Large knowledge base (8000 rules)

No learning

TacAir-Soar Task Decomposition

AchieveProximity

EmployWeapons Search Execute

TacticScram

Get MissileLAR

SelectMissile

Get SteeringCircle

SortGroup

LaunchMissile

Lock Radar Lock IR Fire-Missile Wait-forMissile-Clear

If intercepting an enemy andthe enemy is within range ROE are met thenpropose employ-weapons

EmployWeapons

If employing-weapons andmissile has been selected andthe enemy is in the steering circle and LAR has been achieved, then propose launch-missile Launch

MissileIf launching a missile andit is an IR missile and there is currently no IR lockthen propose lock-IRLock IR

Execute Mission

Fly-route GroundAttackFly-Wing Intercept

If instructed to intercept an enemy then propose intercept

Intercept

>250 goals, >600 operators, >8000 rules 21

Impasse/Substate Implications:

• Substate is really meta-state that allows system to reflect• Substate = goal to resolve impasse

– Generate operator – Select operator (deliberate control)– Apply operator (task decomposition)

• All basic problem solving functions open to reflection – Operator creation, selection, application, state elaboration

• Substate is where knowledge to resolve impasse can be found• Hierarchy of substate/subgoals arise through recursive impasses

Tie Subgoals and Chunking

SouthNorth

ProposeOperator

EvaluateOperators

OperatorInput Propose

OperatorEvaluate

OperatorsSelect

Operator

Tie Impasse

Evaluate-operator (North)

North = 10

Evaluate-operator (South)

Evaluate-operator (East)

= 10 = 10 = 5

Chunking creates rule that applies evaluate-operator

North > EastSouth > EastNorth = South

Chunking creates rules that create preferences

based on what was tested

Chunking Analysis• Converts deliberate reasoning/planning to reaction• Generality of learning based on generality of reasoning

– Leads to many different types learning– If reasoning is inductive, so is learning

• Soar only learns what it thinks about• Chunking is impasse driven

– Learning arises from a lack of knowledge

Extending Soar

• Learn from internal rewards– Reinforcement learning

• Learn facts– What you know– Semantic memory

• Learn events– What you remember– Episodic memory

• Basic drives and …– Emotions, feelings, mood

• Non-symbolic reasoning– Mental imagery

• Learn from regularities– Spatial and temporal clusters

Symbolic Long-Term Memories

Procedural

Symbolic Short-Term MemoryDecision

Procedure

ChunkingReinforcementLearning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception ActionVisual

Imagery

ReinforcementLearning

Clustering

Theoretical Commitments

Stayed the Same• Problem Space Computational Model• Long-term & short-term memories• Associative procedural knowledge• Fixed decision procedure• Impasse-driven reasoning• Incremental, experience-driven

learning• No task-specific modules

Changed• Multiple long-term memories• Multiple learning mechanisms• Modality-specific representations &

processing• Non-symbolic processing

– Symbol generation (clustering)– Control (numeric preferences)– Learning Control (reinforcement learning)– Intrinsic reward (appraisals)– Aid memory retrieval (WM activation)– Non-symbolic reasoning (visual imagery)

Reinforcement LearningShelly Nason

RL in Soar

1. Encode the value function as operator evaluation rules with numeric preferences.

2. Combine all numeric preferences for an operator dynamically.

3. Adjust value of numeric preferences with experience.

Internal State

Value Function

PerceptionReward

Update ValueFunction

Action Selection Action

The Q-function in Soar

The value-function is stored in rules that test the state and operator, and create numeric preferences.

sp {rl-rule (state <s> ^operator <o> +) …--> (<s> ^operator <o> = 0.34)}

Operator Q-value = the sum of all numeric preferences.Selection: epsilon greedy, or Boltzmann

O1: {.34, .45, .02} = 8.1

O2: {.25, .11, .12} = 4.8

O3: {-.04, .14, -.05} = .05

epsilon-greedy: With probability ε the agent selects an action at random. Otherwise the agent takes the action with the highest expected value. [Balance exploration/exploitation]

Updating operator values

Sarsa update:Q(s,O1) Q(s,O1) + α[r + λQ(s’,O2) – Q(s,O1)] .1 * [.2 + .9*.11 - .33] = -.03

Update is split evenly between rules contributing to O1 = -.01.R1 = .19, R2 = .14, R3 = -.03

O1 = .33

Q(s,O1) = sum of numeric prefs.

r = reward = .2

O2 = .11

Q(s’,O2) = sum of numeric prefs. of selected operator (O2)

R1(O1) = .20R2(O1) = .15R3(O1)= -.02

Results with Eaters

1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289

Move #

Figure 2a rule

Random

After 5

After 10

After 15

After 20

RL TankSoar Agent

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171

Successive Games

Semantic MemoryYongjia Wang

Memory Systems

Memory

Long Term Memory Short Term Memory

Declarative Procedural

Semantic Memory

Episodic Memory

Perceptual Representation

System

Procedural Memory

Working Memory

Declarative Memory Alternatives

• Working Memory– Keep everything in working memory

• Retrieve dynamically with rules– Rules provide asymmetric access – Data chunking to learn (complex)

• Separate Declarative Memories– Semantic memory (facts)– Episodic memory (events)

Basic Semantic Memory Functionalities

• Encoding– What to save?– When to add new declarative chunk?– How to update knowledge?

• Retrieval– How the cue is placed and matched?– What are the different types of retrieval?

• Storage– What are the storage structures? – How are they maintained?

Semantic Memory Functionalities

AExpand

NIL NIL

ExpandCue

NILSave

Feature Match

Retrieval

Update with Complex Structure

AutoCommit

Remove-No-Change

Semantic Memory

Working Memory

Episodic Memory Andrew Nuxoll

Memory Systems

Memory

Long Term Memory Short Term Memory

Declarative Procedural

Semantic Memory

Episodic Memory

Perceptual Representation

System

Procedural Memory

Working Memory

Episodic vs. Semantic Memory

• Semantic Memory–Knowledge of what we “know”–Example: what state the Grand Canyon

is in• Episodic Memory

–History of specific events–Example: a family vacation to the Grand Canyon

Characteristics of Episodic Memory: Tulving• Architectural:

– Does not compete with reasoning.– Task independent

• Automatic: – Memories created without deliberate decision.

• Autonoetic: – Retrieved memory is distinguished from sensing.

• Autobiographical: – Episode remembered from own perspective.

• Variable Duration: – The time period spanned by a memory is not fixed.

• Temporally Indexed: – Rememberer has a sense of when the episode occurred.

Long-term Procedural MemoryProduction Rules

Implementation

Encoding Initiation?

Storage

Retrieval

When the agent takes an action.

Output Cue

Retrieved

Working Memory

Current Implementation

Encoding Initiation Content?Storage

Retrieval

The entire working memory is stored in the episode

Output Cue

Retrieved

Working Memory

Encoding Initiation ContentStorage Episode Structure?Retrieval

Episodes are stored in a separate memory

Output Cue

Retrieved

Working Memory

EpisodicMemory

EpisodicLearning

Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue?

Cue is placed in an architecture specific buffer.

Output Cue

Retrieved

Working Memory

EpisodicMemory

EpisodicLearning

EpisodicMemory

Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue Retrieval

The closest partial match is retrieved.

Output Cue

Retrieved

Working Memory

EpisodicLearning

Cognitive Capability: Virtual Sensing• Retrieve prior perception that

is relevant to the current task • Tank recursively searches

memory– Have I seen a charger from here?– Have I seen a place where I can

see a charger? ?

Virtual Sensors Results

1 3 5 7 9 11 13 15 17 19

Subsequent Searches

Average RandomEpisodic Memory

Create a memory cue

SouthNorth

Evaluate moving in each available direction

Cognitive Capability: Action Modeling

EpisodicRetrieval

Retrieve the best matching memory

RetrieveNext Memory

Retrieve the next memory Use the change in score to evaluate the proposed action

Move North = 10 points

Agent’s knowledge is insufficient - impasseAgent attempts to choose direction

Episodic Memory:Multi-Step Action Projection

[Andrew Nuxoll]

• Learn tactics from prior success and failure– Fight/flight– Back away from enemy (and fire)– Dodging

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174

Average Margin of Victory

Successive Games

Enables Cognitive Capabilities • Sensing

– Detect Changes – Detect Repetition– Virtual Sensing

• Reasoning– Model Actions– Use Previous

Successes/Failures– Model the Environment– Manage Long Term Goals– Explain Behavior

• Learning– Retroactive Learning– Allows Reanalysis Given New

Knowledge– “Boost” other Learning

Mechanisms

Episodic Memory

Mental Imagery and Spatial ReasoningScott Lathrop

Sam Wintermute

See AGI Talks

• Shape, color, topology, spatial properties • Depictive, pixel-based representations• Image algebra algorithms

Sentential/Algebraic algorithms Depictive/Ordinal algorithms

VISUAL IMAGERY

VISUAL-SPATIAL VISUAL-DEPICTIVE

• Location, orientation • Sentential, quantitative

representations• Linear algebra and

computational geometry algorithms

WHAT IS VISUAL IMAGERY?

Where can you put A next to I?

Spatial Problem Solving with Mental Imagery[Scott Lathrop & Sam Wintermute]

Environment

Spatial Scene

SoarQualitative descriptions of object relationships

Qualitative description of new objects in relation to existing objects

Quantitative descriptions of environmental objects

A’ A’

(on AI)

(imagine_left_of A I)

(intersect A′ O)(no_intersect A’)

(imagine_right_of A I)(move_right_of A I)

Upcoming Challenges

• Continued refinement and integration• Integrate with complex perception and motor

systems• Adding/learning lots of world knowledge

+ Language, Spatial, Temporal Reasoning, …• Scaling up to large bodies of knowledge

– Build up from instruction, experience, exploration, …

Soar Community

• Soar Website– http://sitemaker.umich.edu/soar

• Soar Workshop every June in Ann Arbor– June 22-26, 2009

• Soar-group– http://lists.sourceforge.net/lists/listinfo/soar-group– Low traffic

Thanks to

Funding Agencies: NSF, DARPA, ONRPh.D. students:

Nate Derbinsky, Nicholas Gorski, Scott Lathrop, Robert Marinier, Andrew Nuxoll, Yongjia Wang, Samuel Wintermute, Joseph Xu

Research Programmers:Karen Coulter, Jonathan Voigt

Continued inspiration:Allen Newell

Challenges in Cognitive Architecture Research

• Dynamic taskability– Pursue novel tasks

• Learning– Always learning, learning in unexpected and unplanned ways (wild learning)– Transition from programming to learning by imitation, instruction, experience, reflection,

…• Natural language

– Active area but much left to do.• Social behavior

– Interaction with humans and other entities • Connect to the real world

– Cognitive robotics with long-term existence• Applications

– Expand domains and problems– Putting cognitive architectures to work

• Connect to unfolding research on the brain, psychology, and the rest of AI.60

soar one-hour tutorial

humanlevel ai

cell phone question

external actions

available knowledge

logic probability

environment complex

basic soar processing

external environmentssubgoals

Documents

soar tutorial introduction - electrical engineering and...

soar workshop semantic memory...

soar final1

express linux tutorial learn basic commands in an hour

soar- ebook

soar webinar series: strengthening soar applications

earth hour issue of the week and active tutorial 2014

the soar 9 tutorial -...

soar 20051 progress on nl-soar, and introducing xnl-soar...

1 pdms – 2 hour tutorial

progress on nl-soar, and introducing xnl-soar

1 hour quickstart tutorial - poznań university of · pdf...

the soar 8 tutorial -...

soar 9 tutorial part 3

soar catchment management plan river soar catchment ... ·...

the soar user’s manual version 9.6 soar user’s manual...

1 hour basket tutorial - london modern quilt …! 1! hearts...

soar basics - university of...

soar 9 tutorial part 1

1 hour basket tutorial