J. William Murdock 1/42
Self-Improvement through Self-Understanding:Model-Based Reflection for Agent Adaptation
J. William MurdockIntelligent Decision Aids Group
Navy Center for Applied Research in Artificial IntelligenceNaval Research Laboratory, Code 5515
Washington, DC [email protected] http://bill.murdocks.org
Presentation at NIST – March 18, 2002
J. William Murdock 2/42
Adaptation
• People adapt very well.– They figure out how to do new things.– If something doesn’t work, they try something else.– They understand how and why they are doing things.
• Computer programs do not adapt very well.– They can only do what they are programmed for.– They keep making the same mistakes.– They have no understanding of themselves.
• People adapt very well.– They figure out how to do new things.– If something doesn’t work, they try something else.– They understand how and why they are doing things.
• Computer programs do not adapt very well.– They can only do what they are programmed for.– They keep making the same mistakes.– They have no understanding of themselves.
Can we make computer programs adapt?Can we make computer programs adapt?
J. William Murdock 3/42
REM(Reflective Evolutionary Mind)
• Operating environment for intelligent agents• Provides support for adaptation to new
functional requirements• Uses functional models, generative planning,
and reinforcement learning• J. William Murdock and Ashok K. Goel
• Operating environment for intelligent agents• Provides support for adaptation to new
functional requirements• Uses functional models, generative planning,
and reinforcement learning• J. William Murdock and Ashok K. Goel
J. William Murdock 4/42
Example:Web Browsing Agent
• A mock-up of web browsing software• Based on Mosaic for X Windows, version 2.4• Imitates not only behavior but also internal
process and information of Mosaic 2.4
• A mock-up of web browsing software• Based on Mosaic for X Windows, version 2.4• Imitates not only behavior but also internal
process and information of Mosaic 2.4
???ps
pdf txt
html
J. William Murdock 5/42
Example:Disassembly and Assembly
• Software agent for disassembly in the domain of cameras– Information about cameras– Information about relevant actions
• e.g., pulling, unscrewing, etc.
– Information about disassembly processing• e.g., decide how to disconnect subsystems
from each other and then decide how to disassemble those subsystems separately.
• Agent now needs to assemble a camera
• Software agent for disassembly in the domain of cameras– Information about cameras– Information about relevant actions
• e.g., pulling, unscrewing, etc.
– Information about disassembly processing• e.g., decide how to disconnect subsystems
from each other and then decide how to disassemble those subsystems separately.
• Agent now needs to assemble a camera
J. William Murdock 6/42
• TMK models provide the agent with knowledge of its own design.
• TMK encodes:– Tasks: functional specification / requirements and results– Methods: behavioral specification / composition and control– Knowledge: Domain concepts and relations
• TMK models provide the agent with knowledge of its own design.
• TMK encodes:– Tasks: functional specification / requirements and results– Methods: behavioral specification / composition and control– Knowledge: Domain concepts and relations
Remote Local
…
URL’s, servers,documents, etc.
Access
Request Receive Store
TMK (Task-Method-Knowledge)
J. William Murdock 7/42
...
REM Reasoning Process
A Method
Implemented Task
......
Unimplemented Task
Set of Input Values
Set of Input Values
Execution
Adaptation
ADAPTED Method
ADAPTED Implemented Task
...TraceSet of Output Values
J. William Murdock 8/42
...
ProactiveModel Transfer
...
Adaptation Process
Task
A Method
Similar Implemented Task
......
Situator(for Q-Learning) ADAPTED Method
ADAPTED Implemented Task
...
Failure-DrivenModel Transfer
Existing Method
Trace
Set of Input Values
Generative Planning
J. William Murdock 9/42
...Select Next Task
Within Method
Execution Process
SelectMethod
ExecutePrimitive Task
A Method
Implemented Task
...Set of Input Values
TraceSet of Output Values
J. William Murdock 10/42
Selection: Q-Learning
• Popular, simple form of reinforcement learning.• In each state, each possible decision is assigned an
estimate of its potential value (“Q”).• For each decision, preference is given to higher Q
values.• Each decision is reinforced, i.e., it’s Q value is altered
based on the results of the actions.• These results include actual success or failure and
the Q values of next available decisions.
• Popular, simple form of reinforcement learning.• In each state, each possible decision is assigned an
estimate of its potential value (“Q”).• For each decision, preference is given to higher Q
values.• Each decision is reinforced, i.e., it’s Q value is altered
based on the results of the actions.• These results include actual success or failure and
the Q values of next available decisions.
J. William Murdock 11/42
Q-Learning in REM
• Decisions are made for method selection and for selecting new transitions within a method.
• A decision state is a point in the reasoning (i.e., task, method) plus a set of all decisions which have been made in the past.
• Initial Q values are set to 0.• Decides on option with highest Q value or randomly
selects option with probabilities weighted by Q value (configurable).
• A decision receives positive reinforcement when it leads immediately (without any other decisions) to the success of the overall task.
• Decisions are made for method selection and for selecting new transitions within a method.
• A decision state is a point in the reasoning (i.e., task, method) plus a set of all decisions which have been made in the past.
• Initial Q values are set to 0.• Decides on option with highest Q value or randomly
selects option with probabilities weighted by Q value (configurable).
• A decision receives positive reinforcement when it leads immediately (without any other decisions) to the success of the overall task.
J. William Murdock 12/42
Task-Method-Knowledge Language (TMKL)
• A new, powerful formalism of TMK developed for REM.
• Uses LOOM, a popular off-the-shelf knowledge representation framework: concepts, relations, etc.
• A new, powerful formalism of TMK developed for REM.
• Uses LOOM, a popular off-the-shelf knowledge representation framework: concepts, relations, etc.
REM models not only the tasks of the domain but also itself in TMKL.
REM models not only the tasks of the domain but also itself in TMKL.
J. William Murdock 13/42
Tasks in TMKL
• All tasks can have input & output parameter lists and given & makes conditions.
• A non-primitive task must have one or more methods which accomplishes it.
• A primitive task must include one or more of the following: source code, a logical assertion, a specified output value.
• Unimplemented tasks have neither of these.
• All tasks can have input & output parameter lists and given & makes conditions.
• A non-primitive task must have one or more methods which accomplishes it.
• A primitive task must include one or more of the following: source code, a logical assertion, a specified output value.
• Unimplemented tasks have neither of these.
J. William Murdock 14/42
TMKL Task
(define-task communicate-with-www-server :input (input-url) :output (server-reply) :makes (:and (document-at-location (value server-reply) (value input-url)) (document-at-location (value server-reply) local-host)) :by-mmethod (communicate-with-server-method))
J. William Murdock 15/42
Methods in TMKL
• Methods have provided and additional result conditions which specify incidental requirements and results.
• In addition, a method specifies a start transition for its processing control.
• Each transition specifies requirements for using it and a new state that it goes to.
• Each state has a task and a set of outgoing transitions.
• Methods have provided and additional result conditions which specify incidental requirements and results.
• In addition, a method specifies a start transition for its processing control.
• Each transition specifies requirements for using it and a new state that it goes to.
• Each state has a task and a set of outgoing transitions.
J. William Murdock 16/42
Simple TMKL Method
(define-mmethod external-display
:provided (:not (internal-display-tag (value server-tag)))
:series (select-display-command
compile-display-command
execute-display-command))
J. William Murdock 17/42
Complex TMKL Method(define-mmethod make-plan-node-children-mmethod :series (select-child-plan-node make-subplan-hierarchy add-plan-mappings set-plan-node-children))(tell (transition>links make-plan-node-children-mmethod-t3 equivalent-plan-nodes child-equivalent-plan-nodes) (transition>next make-plan-node-children-mmethod-t5 make-plan-node-children-mmethod-s1) (:create make-plan-node-children-terminate transition) (reasoning-state>transition make-plan-node-children-mmethod-s1 make-plan-node-children-terminate) (:about make-plan-node-children-terminate (transition>provided '(terminal-addam-value (value child-plan-node)))))
J. William Murdock 18/42
Knowledge in TMKL
Foundation: LOOM– Concepts, instances, relations– Concepts and relations are instances and can
have facts about them.
Foundation: LOOM– Concepts, instances, relations– Concepts and relations are instances and can
have facts about them.
Knowledge representation in TMKL involves LOOM +
some TMKL specific reflective concepts and relations.
Knowledge representation in TMKL involves LOOM +
some TMKL specific reflective concepts and relations.
J. William Murdock 19/42
Some TMKLKnowledge Modeling
(defconcept location)(defconcept computer :is-primitive location)(defconcept url :is-primitive location :roles (text))(defrelation text :range string :characteristics :single-valued)(defrelation document-at-location :domain reply :range location)(tell (external-state-relation document-at-location))
J. William Murdock 20/42
Sample Meta-Knowledge in TMKL
•relation characteristics
–single-valued/multiple-valued
–symmetric, commutative
•relation characteristics
–single-valued/multiple-valued
–symmetric, commutative
•relations over relations
–external/internal
–state/definitional
•relations over relations
–external/internal
–state/definitional
•generic relations
–same-as
–instance-of
–inverse-of
•generic relations
–same-as
–instance-of
–inverse-of
•concepts involving concepts
–thing
–meta-concept
–concept
•concepts involving concepts
–thing
–meta-concept
–concept
J. William Murdock 21/42
Web Browsing Agent
• Interactive Domain: Web agent is affected by the user and by the network
• Dynamic Domain: Both users and networks often change
• Knowledge Intensive Domain: Documents, networks, servers, local software, etc.
• Interactive Domain: Web agent is affected by the user and by the network
• Dynamic Domain: Both users and networks often change
• Knowledge Intensive Domain: Documents, networks, servers, local software, etc.
Mock-up of a web browser:
Steps through the web-browsing process
Mock-up of a web browser:
Steps through the web-browsing process
J. William Murdock 22/42
Tasks and Methodsof Web Agent
Communicate with WWW Server Display File
Process URL Method
Process URL
Request from Server Receive from Server
Communicate with WWW Server Method
Interpret Reply Display Interpreted File
External Display Internal Display
Execute Internal DisplaySelect Display Command Compile Display Command Execute Display Command
Display File Method
J. William Murdock 23/42
Example: PDF Viewer
• The web agent is asked to browse the URL for a PDF file. It does not have any information about external viewers for PDF.
• Because the agent already has a task for browsing URL’s it is executed first.
• When the system fails, the user provides feedback indicating the correct viewer.
• Failure-Driven Model Transfer
• The web agent is asked to browse the URL for a PDF file. It does not have any information about external viewers for PDF.
• Because the agent already has a task for browsing URL’s it is executed first.
• When the system fails, the user provides feedback indicating the correct viewer.
• Failure-Driven Model Transfer
J. William Murdock 24/42
Web Agent Adaptation
External Display
Select Display Command Compile Display Command Execute Display Command
...
External Display
Compile Display Command Execute Display Command
...
Select Display Command Base Method Select Display Command Alternate Method
Select Display Command
Select Display Command Base Task Select Display Command Alternate Task
J. William Murdock 25/42
Physical Device Disassembly
• ADDAM: Legacy software agent for case-based, design-level disassembly planning and (simulated) execution
• Interactive: Agent connects to a user specifying goals and to a complex physical environment
• Dynamic: New designs and demands• Knowledge Intensive: Designs, plans, etc.
• ADDAM: Legacy software agent for case-based, design-level disassembly planning and (simulated) execution
• Interactive: Agent connects to a user specifying goals and to a complex physical environment
• Dynamic: New designs and demands• Knowledge Intensive: Designs, plans, etc.
J. William Murdock 26/42
Disassembly Assembly
• A user with access to ADDAM disassembly agent wishes to have this agent instead do assembly.
• ADDAM has no assembly method thus must adapt first.
• Since assembly is similar to disassembly, REM selects Proactive Model Transfer.
• A user with access to ADDAM disassembly agent wishes to have this agent instead do assembly.
• ADDAM has no assembly method thus must adapt first.
• Since assembly is similar to disassembly, REM selects Proactive Model Transfer.
J. William Murdock 27/42
Pieces of ADDAM which are key to Disassembly Assembly
Adapt Disassembly Plan Execute Plan
Plan Then Execute Disassembly
Disassemble
Hierarchical Plan Execution
Select Next Action Execute Action
Topology Based Plan Adaptation
Make Plan Hierarchy
Make Equivalent Plan Nodes Method
Make Equivalent Plan Node Add Equivalent Plan Node
Map Dependencies
Select Dependency Assert Dependency
J. William Murdock 28/42
New Adapted Task inDisassembly Assembly
COPIED Adapt Disassembly Plan COPIED Execute Plan
COPIED Plan Then Execute Disassembly
Assemble
COPIED Hierarchical Plan Execution
Execute Action
COPIED Topology Based Plan Adaptation
COPIED Make Plan Hierarchy
COPIED Make Equivalent Plan Nodes Method
COPIED Add Equivalent Plan Node
COPIED Map Dependencies
COPIED Select Dependency INVERTED Assert Dependency
INSERTED Inversion Task 1
INSERTED Inversion Task 2
Select Next Action
COPIED Make Equivalent Plan Node
J. William Murdock 29/42
Task: Assert Dependency
Before:define-task Assert-Dependency input: target-before-node, target-after-node asserts: (node-precedes (value target-before-node)
(value target-after-node))
After:define-task Mapped-Assert-Dependency input: target-before-node, target-after-node asserts: (node-follows (value target-before-node)
(value target-after-node)))
J. William Murdock 30/42
Task: Make Equivalent Plan Node
define-task make-equivalent-plan-node
input: base-plan-node, parent-plan-node, equivalent-topology-node
output: equivalent-plan-node
makes: (:and
(plan-node-parent (value equivalent-plan-node)
(value parent-plan-node))
(plan-node-object (value equivalent-plan-node)
(value equivalent-topology-node))
(:implies (plan-action (value base-plan-node))
(type-of-action (value equivalent-plan-node)
(type-of-action (value base-plan-node)))))
by procedure ...
J. William Murdock 31/42
Task:Inverted-Reversal-Task
define-task inserted-reversal-task
input: equivalent-plan-node
asserts: (type-of-action
(value equivalent-plan-node)
(inverse-of
(type-of-action
(value equivalent-plan-node))))
J. William Murdock 32/42
ADDAMExample:
Layered Roof
J. William Murdock 33/42
Roof Assembly
1
10
100
1000
10000
100000
1000000
1 2 3 4 5 6 7
Number of Boards
Ela
pse
d T
ime
(sec
on
ds)
REM: Meta-CBR
REM: Graphplan
REM: Q-Learning
J. William Murdock 34/42
Modified Roof Assembly: No Conflicting Goals
1
10
100
1000
10000
100000
1 2 3 4 5 6 7
Number of Boards
Ela
pse
d T
ime
(sec
on
ds)
REM: Meta-CBR
REM: Graphplan
REM: Q-Learning
J. William Murdock 35/42
Applicability ofProactive Model Transfer
• Knowledge about the concepts and relations in the domain
• Knowledge about how the tasks and methods affect these concepts and relations
• Differences between the old task and the new map onto knowledge of the concepts and relations in the domain.
• Knowledge about the concepts and relations in the domain
• Knowledge about how the tasks and methods affect these concepts and relations
• Differences between the old task and the new map onto knowledge of the concepts and relations in the domain.
J. William Murdock 36/42
Applicability ofFailure-Driven Model Transfer
• May need less knowledge about the domain itself since the adaptation is grounded in a specific incident.– e.g., feedback about PDF for an example instead
of advance knowledge of all document types.
• Still requires knowledge about how the tasks and methods interact with the domain.
• May need less knowledge about the domain itself since the adaptation is grounded in a specific incident.– e.g., feedback about PDF for an example instead
of advance knowledge of all document types.
• Still requires knowledge about how the tasks and methods interact with the domain.
J. William Murdock 37/42
Additional Mechanisms
• Model-based adaptation may leave some design decisions unsolved.– These decisions may be solved by traditional
decision making mechanisms, e.g., reinforcement learning.
• Models may be unavailable or irrelevant for some tasks or subtasks– Generative planning can combine primitive actions.
• Model-based adaptation may leave some design decisions unsolved.– These decisions may be solved by traditional
decision making mechanisms, e.g., reinforcement learning.
• Models may be unavailable or irrelevant for some tasks or subtasks– Generative planning can combine primitive actions.
J. William Murdock 38/42
Level of Decomposition
• Level of decomposition may be dictated by the nature of the agent.– Some tasks simply cannot be decomposed
• In other situations, level of decomposition may be guided by the nature of adaptation to be done.– Can be brittle if unpredicted demands arise.
• REM enables autonomous decomposition of primitives which addresses this problem.
• Level of decomposition may be dictated by the nature of the agent.– Some tasks simply cannot be decomposed
• In other situations, level of decomposition may be guided by the nature of adaptation to be done.– Can be brittle if unpredicted demands arise.
• REM enables autonomous decomposition of primitives which addresses this problem.
J. William Murdock 39/42
Computational Costs
• Reasoning about models incurs some costs.– For very easy problems, this overhead may not be
justified.– For other problems, the benefits enormously
outweigh these costs.
• Reasoning about models incurs some costs.– For very easy problems, this overhead may not be
justified.– For other problems, the benefits enormously
outweigh these costs.
Models can localize planning and learning.Models can localize planning and learning.
J. William Murdock 40/42
Knowledge Requirements
• Someone has to build an agent.• Builder should know what that agent does and
how it does it Can make model.• Analyst may be able to understand builder’s
notes, etc. Can make model• Some evidence for this in the context of
software engineering / architectural extraction.
• Someone has to build an agent.• Builder should know what that agent does and
how it does it Can make model.• Analyst may be able to understand builder’s
notes, etc. Can make model• Some evidence for this in the context of
software engineering / architectural extraction.
J. William Murdock 41/42
Current Work: AHEAD• Theme: Analyzing hypotheses regarding asymmetric
threats (e.g., criminals, terrorists).– Input: Hypotheses regarding a potential threat– Output: Argument for and/or against the hypotheses
• Technique: Analogy over functional models– An extension to TMKL will encode known behaviors for
asymetric threats and the purposes that the behaviors serve.– Analogical reasoning will enable retrieval and mapping of
new hypotheses to existing models.– Models will provide arguments about how observed actions
do or do not support the purposes of the hypothesized behavior.
• Naval Research Laboratory / DARPA Evidence Extraction and Link Discovery program
• David Aha, J. William Murdock, Len Breslow
• Theme: Analyzing hypotheses regarding asymmetric threats (e.g., criminals, terrorists).– Input: Hypotheses regarding a potential threat– Output: Argument for and/or against the hypotheses
• Technique: Analogy over functional models– An extension to TMKL will encode known behaviors for
asymetric threats and the purposes that the behaviors serve.– Analogical reasoning will enable retrieval and mapping of
new hypotheses to existing models.– Models will provide arguments about how observed actions
do or do not support the purposes of the hypothesized behavior.
• Naval Research Laboratory / DARPA Evidence Extraction and Link Discovery program
• David Aha, J. William Murdock, Len Breslow
J. William Murdock 42/42
Summary
• REM (Reflective Evolutionary Mind)– Operating environment for agents that adapt
• TMKL (Task-Method-Knowledge Language)– The language for agents in REM– Functional modeling language for encoding
computational processes• Adaptation
– Some kinds of adaptation can be performed using specialized model-based techniques
– Others require more generic planning & learning mechanisms (localized using models)
• REM (Reflective Evolutionary Mind)– Operating environment for agents that adapt
• TMKL (Task-Method-Knowledge Language)– The language for agents in REM– Functional modeling language for encoding
computational processes• Adaptation
– Some kinds of adaptation can be performed using specialized model-based techniques
– Others require more generic planning & learning mechanisms (localized using models)