understanding behaviours of a situated agent: a markov chain...

Understanding behaviours of a situated agent: A Markov chain analysis

John S Gero, Wei Peng Key Centre of Design Computing and Cognition, University of Sydney, NSW 2006, Australia, {john, wpeng} @arch.usyd.edu.au

+61-2- 9351 2328, +61-2-9351 3031(fax)

Abstract: This paper briefly describes situated agents and constructive memory before modeling the behaviour of such an agent applied in a design optimization domain. Markov analysis is used to represent the dynamic behaviour of the memory system of the agent. It shows that the constructive memory behaves as expected and that reasoning moves from reactive and reflective to reflexive as the agent acquires more similar experiences that are increasingly grounded.

Keywords: Situated agents; Constructive memory; Markov chain; Design optimization

1. Introduction

Situated design computing is a new paradigm for design computing that draws concepts from situated cognition (Clancey, 1997). Situated agents are computational models that have been developed on the notion of “situatedness” (Clancey, 1997). These agents can be used to build a new generation of computer-aided design tool, which can learn by its use (Gero, 2003; Peng and Gero, 2006; Peng, 2006). Central to the concept of “situatedness” is what is called “constructive memory” (Gero, 1999), which entails the means by which an agent develops its experience in its interaction with the environment.

“Situatedness” has its roots in works on empirical naturalism (Dewey, 1896 reprinted in 1981) and cognitive psychology (Bartlett, 1932 reprinted in 1977). The notion of “situatedness” is considered as a conditio sine qua non for any form of “true” intelligence, natural or artificial (Lindblom and Ziemke, 2002). A situation can be viewed as a worldview that bias a person’s interpretation and expectation. A simple example of this worldview affects a person’s behaviour is that two designers when given the same set of requirements produce quite different designs. The theory of situatedness claims that every human thought and action is adapted to the environment. They are situated because of what people perceive, how people conceive of their activity, and what people physically do develop together (Clancey, 1997). Vygotsky (1978) contributed to the concept of “situatedness” by introducing the activity theory, defining that activities of the mind cannot be separated from overt behaviour, or from the social context in which they occur. Social and mental structures interpenetrate each other (Clancey, 1995). In this vein, situatedness is inseparable from interactions in which knowledge

is dynamically constructed. A situated agent constructs a situation, which can be represented as first-person memories obtained through taking account of both its contexts and its experience and the interactions between them.

Memory in computational systems often refers to a place that holds data and information called “memories”. It is indexed so as to be queried more efficiently afterwards. The structure, contents and indexes are fixed and independent to their use (Gero, 2006). However, from a cognitive memory point of view, memory is not a place where descriptions of what have been done or said before are stored, but is indistinguishable from a person’s capability to make sense, to learn a new skill, to compose something new (Clancey, 1991). This is the essence of Bartlett’s model of constructive memory (Bartlett, 1932 reprinted in 1977). It is argued that the contents, the structures of a constructive memory are changed by their use (Gero, 2006). Memories are constructed initially from that experience (previous memories) in response to needs for a memory of that experience but the construction of the memory is connected to the current view of the world at the time of the demand for the memory (Gero, 1999). The notion of a constructive memory reflects how the system adapted to its environment (Gero and Smith, 2006). A memory construction process can be viewed as the way a system uses its previous memory structures and contents to conceptualize, to give meanings to its environmental stimuli. A constructive memory model (Gero, 1999) provides a conceptual framework for us, within which the concept of “situatedness” can be implemented in a software agent.

In this paper, the behaviours of a situated agent are investigated in a number of time-series experiments in design optimization, in which the agent is exposed to heterogeneous design scenarios and develops unsupervised behaviours. These

2 J.S. Gero, W. Peng / Engineering Applications of Artificial Intelligence

behaviours are explored at two different levels – the microscopic and macroscopic behaviours. The microscopic behaviours (hereafter called micro behaviours) are detailed processes and constraints that are defined by the situated agent architecture, for example, sensation, perception, experience activation and reactivation. The investigation of the agent’s micro behaviours and their dependencies in relation to contexts, may lead to better understandings of causalities for situated behaviours. On the other hand, analyzing functional aggregations of these micro behaviours, which are called the agent’s macro behaviours, enable us describe characteristics of situated behaviours in time-series events. These macro behaviours are defined as reflexive, reactive and reflective behaviour (Maher and Gero, 2002).

What is interesting is whether there are behaviour patterns for a situated agent in time-series events and how these patterns can be explained. A Markov chain approach is utilized to analyze behaviours obtained from test data. Markov chains have been used as stochastic models study the time-dependent behaviours of dynamic systems (Siu, 1994)) and complex adaptive systems (Spears, 1998; 1999). The behaviours of these systems are specified as transition probabilities between the system’s states over time. The fundamental assumption for a first-order Markov process is that the conditional probability distribution of the state in the future, given the state of the process currently and in the past, depends only on its current state and not on its state in the past.1 A second-order Markov chain takes account of the current state and also the previous state. The Markov chain is an idea tool for us to construct a descriptive model of time-dependent relationships among behaviours of a situated agent.

2. Micro Behaviours in Design Optimization Experiments

Design optimization is selected as a test bed to carry out the Markov chain analysis. Design optimization is concerned with identifying optimal design solutions which meet design objectives while conforming to design constraints. A large number of optimization algorithms have been developed and are commercially available. Many design optimization tools focus on gathering a variety of mathematical programming algorithms and providing the means for the user to access them to solve design problems.2 For example, Matlab Optimization Toolbox 3.0 includes a variety of functions for linear programming, quadratic programming, nonlinear optimization and nonlinear least squares. Choosing a suitable optimizer becomes the bottleneck in a design optimization process. The recognition of appropriate optimization models is fundamental to design decision problems (Radford and Gero, 1988). In this paper, a situated agent wraps around a design optimization tool (Matlab Optimization Toolbox), learns concepts of how the tool is used

1 http://en.wikipedia.org/wiki/Markov_chain 2 http://www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide/

in optimizing a design and adapts its behaviours based on these concepts.

The focus of these design optimization experiments is to observe and analyze the agent’s behaviours in heterogeneous design optimization scenarios. A sequence of 15 design scenarios is created and executed. Each scenario represents a design task which is further composed of a number of design actions. For example, a typical design optimization task consists of a number of actions: defining objective function; identifying objective function type; defining design variables, variable types; describing design constraints, constraint types; defining gradients of objective function and constraints; defining matrices, such as Hessian matrix and its type, A, b

matrices (only available for Matlab users), etc. selecting optimizers; submitting design problem or editing design problem; submitting feedback on agent’s outputs.

A typical sequence of tasks is: {L, Q, Q, L, NL, Q, NL, L, L, NL, Q, Q, L, L, L}

“Q”, “L” and “NL” represent quadratic, linear and nonlinear design optimization problems respectively. The initial experience of the agent holds one instance of a design optimization scenario solved by a quadratic programming optimizer.

2.1. Behaviours of a situated agent

Situated agents can sense and put forward changes to the environment via sensors and effectors. Sensors gather environmental changes into data structures called sense-data. Sensation (S) is the process that transfers sense-data into multi-modal sensory experiences. This is through “push” and “pull” processes. A push process is a data-driven process in which changes from the external world trigger changes in the agent’s internal world, for example, the agent’s experience. A pull process is an expectation-driven process in which the agent updates the internal world according to the expectation-biased external changes (Gero and Fujii, 2000; Gero and Kannengiesser, 2006). The push and pull processes can occur at different levels of processing, for example, sensation, perception and conception. The pushed sense-data are also called exogenous sense-data (Se). They are triggered by external environmental changes, that is, actions performed by designers in using the design tool. The pulled sense-data are intentionally collected during the agent’s expectation-driven process. In the pull process, sensors are triggered from the agent’s higher level process (that is, perception, conception) and draw environment changes to update their sense-data. Sensory data (Se+a) consist of two types of variables: the exogenous sense-data (Se) and the

J.S. Gero, W. Peng / Engineering Applications of Artificial Intelligence 3

autogenous sensory experience (Sa). Sa is created from matching the agent’s exogenous sense-data (Se) with the agent’s sensory level experience. Sensory experiences (Se+a) are a combination of the agent’s exogenous sense-data (Se) and the related autogenous information (Sa).

For instance, sense-data Se is captured by sensors as a sequence of unlabelled events:

Se (t) = {…… “a mouse click on a certain text field”, key

stroke of “x”, “y”……}. Based on the lowest level of sensory experience, which holds

modality information, the agent creates an autogenous variable (Sa) with its initial label for the Se:

Sa (t) = {“Label for the clicked text field”}.

Thus, sensory experience Se+a can be created as:

Se+a (t) = {…… [“Label for the clicked test field” | Key strokes “x”, “y”]……}

Perception (P) generates percepts based on the agent’s

sensory experiences. Percepts are intermediate data structures that are generated from mapping sensory data into categories. The sensory experience Se+a is further processed and categorized to create an initial percept Pi which can be used to generate a memory cue. The initial percept can be structured as a triplet “Percept (Object, Property, Values of properties)”. It is expressed as:

Pi (t) = Object { Property for the clicked test field, value of

that property “xy”} The perceptual object can be used to cue a memory of the

agent’s experience. A cue refers to a stimulus that can be used to activate the agent’s experience to obtain a memory of that experience. It is generated from matching percepts with the agent’s perceptual experience. A cue is subsequently assigned with an activation value to trigger responses from the agent’s experience. The cueing function is implemented using experience activation and reactivation (Ia and Ir), in which a memory cue is applied to the experience structure to obtain a response.

Conception (C) is the process of categorizing perceptual sequences and chunks in order to form proto-concepts. A concept is regarded as a result of an interaction process in which meanings are attached to environmental stimuli. In order to illustrate a concept learning process, the term “proto-concept” is used to describe the intermediate state of a concept. A proto-concept is a knowledge structure that depicts the agent’s interpretations and anticipations about its external and internal environment at a particular time. Conception consists of three basic functions: conceptual labeling (C1), constructive learning (C2) and induction (C3). Conceptual labeling creates proto-

concepts based on experiential responses to an environment cue. This includes deriving anticipations from these responses and identifying the target. Constructive learning allows the agent to accumulate lower level experiences. Induction can generalize abstractions from the lower level experience and is responsible for generating conceptual knowledge structures.

The hypothesizing process (H) generates a hypothesis from current learned proto-concepts. It is where reinterpretation takes place in allowing the agent to learn in a “trial and error” manner. A situated agent reinterprets its environment using hypotheses which are explanations that are deduced from its domain knowledge (usually conceptual). An agent needs to refocus on or construct a new proto-concept based on hypotheses.

Validation (Vd) is the process in which the agent verifies its proto-concepts and hypotheses. It pulls information from the environment to observe whether the environment is changing as expected. A valid concept or experience will be grounded into experiences by incorporation, reconfiguration or reinforcement.

The grounding process refers to the experiential grounding. This reinforces the valid concepts or activated experience via changing the structures of the experience so that the likelihood of the grounded experience being activated in similar circumstance is increased. This is implemented by a grounding via weight adaptation process (Wa), which adjusts the weights of each excitatory connection of the valid concept of an IAC neural network (McClelland, 1981; 1995), so that those nodes that fire together become more strongly connected. Reflexive experience response (Rex) occurs when the experiential response to current sensed data is sufficiently strong to reach a reflexive threshold. A sensory experience can affect action directly. In this circumstance, the agent reflexes to environment stimuli based solely on its experience without activation.

As aggregations of the above-mentioned micro behaviours, macro behaviours represent functions that an agent copes with in its environment. The agent’s reflexive behaviour (Rx)3 is triggered by environment stimuli which are able to cause reflexive experience response (Rex). A snapshot of reflexive behaviour can be expressed as “environment stimuli sensor S P Cue Rex Vd Wa and/or C3”.

In its reactive behaviour (Ra), an agent reasons by applying its experience to respond to an environment stimulus in a self-organized way. In this mode, the agent activates its experience structures (an IAC neural network) to obtain a response. This can be expressed as “environment stimuli sensor S P Cue Ia C1 Vd Wa and/or C3”.

In the reflective behaviour (Rf), the agent reasons about its actions by drawing new sense-data from a lower level, reactivating its experience and/or hypothesizing a new proto-concept. This involves the higher-level conceptual experience and hypothesizer. In this mode, the agent’s behaviours can be aggregated as “environment stimuli sensor S P Cue Ir [L needs definition and care as L has been used to mean

3 Note Rx is different from Rex, which is a micro-level behaviour


linear] C1 Vd Wa and/or C3” or “S P H Ir C1 Vd Wa and/or C3”.

Knowledge construction behaviour (Kc) is a special form of macro behaviour in which an agent learns new experience via constructive learning function (C2) of the conception process. It can be represented as “environment stimuli sensor S + P S+P … C2”. Table 1 shows symbols represent various behaviours that are used in this analysis.

Table 1. Symbols that represent various micro behaviours

Symbols Micro Behaviours (Be) Related Macro Behaviours

S Sensation Both Ra, Rf, Rx, Kc P Perception Both Ra, Rf, Rx, Kc C1 Conception process 1 –

conceptual labelling Ra, Rf

C2 Conception process 2 – conception via constructive learning

Kc

C3 Conception process 3 – conception via inductive learning

Both Ra, Rf, Rx, Kc

Ia IAC neural network activation Ra Ir IAC neural network re-activation Rf H Hypothesising Rf Rex Reflexive experience response Rx Vd Validation Both Ra, Rf, Rx Wa Weight adaptation Both Ra, Rf, Rx

Table 2 presents the Markov states of the system used in the

following tests. Sensation (S) and Perception (P) can be grouped into one state and serve as the base for other behaviours in other states, such as Ia at S2, because they run parallel at low level and support other behaviours.

Table 2. Markov states and behaviours in these states

State Behaviours Dominant Behaviour

S1 S+P S and P – low level behaviours

S2 S+P+Ia Ia -- activating experience S3 S+P+C1 C1 – conceptual labelling S4 S+P+Vd Vd – validation S5 S+P+Ir Ir – reactivating experience S6 S+P+Rex Rex – reflexive experiential

response S7 S+P+H H – hypothesizing S8 S+P+C2+C3 C2 and C3 – constructive

learning and then inductive learning

S9 S+P+Wa+C3 Wa and C3 – weight adaptation and then inductive learning

S10 S+P+Wa Wa – weight adaptation S11 S+P+C2 C2 – inductive learning

2.2. Markov analysis for test 1

The purpose of Test 1 is to investigate a situated agent’s behaviour in a dynamic environment which consists of a

sequence of design optimization scenarios. How the agent that initially holds a quadratic programming design optimization experience produces changes in its reasoning and memory construction process in relation to the environmental changes needs to be studied. A sequence of tasks – {L, Q, Q, L, NL, Q, NL, L, L, NL, Q, Q, L, L, L} is created and adopted. Based on data acquired from this test, a transition matrix is produced. There are altogether 11 states and 73 state transitions in this test. A Markov state transition diagram represents the possible transition between these states in graphical form, which allows us to understand a Markov system. The dark shades nodes (S8 – S11) in Fig. 1 are terminate states, in which the system steps out an experiment.

Fig. 1. The Markov state transition diagram for the agent in test 1.

The first-order time-series associations between the states in Test 1, Fig. 1, can be sorted in Table 3 based on states. The causalities that drive these associated behaviours are further discussed from both the agent’s functional processes at its microscopic level and the environment contexts the agent encountered at a macroscopic level.

Table 3 A detailed discussion of associated behaviours

Time-series Association

Transition Probability

Behaviour Descriptions and Causalities for associations

S1 S1 0.34 The agent continues sensing (S) and perceiving (P). This is because the agent faces a new environment context.

S1 S2 0.50 The agent senses (in S), perceives (in P) the environment stimuli and activates its experience (in Ia). This is an internal reaction related process in relation to a familiar environment context.

S1 S6 0.04 The agent reflexes to an environment context (in Rex). This indicates that the agent has a very strong experience for that context. But this association is rare with only 0.04 probabilities.

S1 S8 0.04 The agent performs constructive

S2

S3

S4

S5

S6

S7

S8

S9

S10

S1

S11

0.500.34 1.00 0.73

0.25

0.17

0.04

0.04

0.08

0.20

1.00

0.50

0.50

0.07 0.170.08

1.00

0.33

S2

S3

S4

S5

S6

S7

S8

S9

S10

S1

S11

0.500.34 1.00 0.73

0.25

0.17

0.04

0.04

0.08

0.20

1.00

0.50

0.50

0.07 0.170.08

1.00

0.33


learning (in C2). Since there is an inductive learning accompanied (in C3), we can conclude that the agent is not in the initial stage of test.

S1 S11 0.08 The agent performs constructive learning (in C2). This occurs at the initial stage of test (note no inductive learning is performed).

S2 S3 1.00 The agent activates its experience and selects a concept. Ia and C1 have a strong dependence due to the internal functional constraints of the agent, in which the agent always selects a concept for an activated experience. The agent meets a familiar environment stimulus.

S3 S4 0.73 The agent selects a concept to react (in C1) and then observes the environmental changes in order to validate (in Vd) that concept. This is also an internal reaction and reflection related process.

S3 S7 0.20 The agent hypothesizes (H) sometimes after the focused or the refocused concepts (C1) fail to validate. This is a reflective related mechanism.

S3 S10 0.07 The agent grounds (in Wa) its focused concept (in C1) when receiving a direct positive feedback from the user. No inductive learning shows that the agent is in the early state of test.

S4 S1 0.08 The agent senses (S) and perceives (P) its environment to validate a concept (in Vd). This is a function related to validation.

S4 S5 0.25 The agent re-activates (in Ir) its experience when an existing experience is not able to validate (in Vd). This is a reflection related process.

S4 S9 0.50 The agent reinforces the validated experience (in Wa) and induces new conceptual knowledge (in C3) to refresh its experience. This is the grounding related mechanism when an experience (reactive or reflective) is proved to be useful in interactions.

S4 S10 0.17 The agent’s validated experience is grounded (in Wa) but no inductive learning (no C3) is occurred due to insufficient data obtained. This happens at the early stage of the test. This behaviour is no longer observed after the agent accumulated enough perceptual experience for it to induce a conceptual knowledge.

S5 S3 0.50 Once the agent re-activates its experience (in Ir), it re-focuses on a new concept in C1. This is the agent’s reflection related processes when it is not able to validate its activated experience in reaction.

S5 S4 0.17 The agent validates (in Vd) the re-activated experience (Ir). This is a reflective related process during which the agent creates hypotheses, refocuses on a new concept and then observes the impacts of the refocused concept in interactions.

S5 S9 0.33 The agent reinforces (in Wa) the re-activated experience (in Ir) and induces new knowledge (in C3). This is also a grounding related process in which the agent’s reflective experience is proved to be useful.

S6 S9 1.00 The agent reflexes (in Rex) and reinforces (in Wa) the reflexive

experience. This shows that the agent’s reflexive experiences are always get grounded.

S7 S5 1.00 The agent makes a hypothesis (in H) and re-activates (in Ir) its experience based on the hypothesis. This is a strong behaviour. This shows the agent’s internal reflective processes in relation to a new or confused environment context.

Based on the state diagrams from Fig. 1 and detailed

discussions from Table 3, some findings are discussed as below: There is a primary path with the highest transition

probabilities (from S1 S2 S3 S4 S9). This shows that the system is mostly likely to perform reaction-related behaviours. The explanation is that the agent initially holds related experience (in quadratic programming) corresponding to certain environmental stimuli (there are seven Ls, five Qs) and the experience is subsequently reinforced during the test;

The system performs other behaviours: knowledge construction behaviours (S1 S8, S1 S11), reflection related behaviours (S3 S7 S5 S3) and reflexive behaviour (S1 S6 S9). But these behaviours are with less probabilities compare to reaction related behaviours. Similar to above-mentioned reaction-related behaviours, the decisive factor for behaviour is the agent’s experience and its contextual environment, in the sense that the availability and strength of a certain experience are the result of the agent’s interaction with its environment, and on the other hand, that experience decides how the agent behave in a particular environment.

Strong dependences exist between behaviour states, such as S2 S3, S6 S9, S7 S5, represents characteristics for reactive, reflexive and reflective behaviours. The causes for these behaviour patterns are related to environment context, the agent’s experience and the way which the agent processes the contextual stimuli.

This test shows that a situated agent learns and modifies

behaviours based on the experience it has, the contextual environment it encounters and the interactions between the agent and the environment.

2.4. The Markov analysis for test 2

In this section, the agent trained at the end of Test 1 is further experimented. The purpose of this test is to investigate the agent’s behaviours in the circumstance where the agent is exposed to the same sequence that it has already experienced in Test 1. It is interesting to know whether the agent simply repeats its behaviours or develops new behaviours from the interactions in this test when the agent holds different experiences to those of the agent in Test 1.

The agent contains four experience nodes which were learned in Test 1. Through this extended test, 64 state transitions are gathered. The transition matrix contains 8 states and their


transition probabilities. The agent exemplifies different characteristics in its behaviours compared to those it showed in Test 1. Whilst in Test 1, the agent has diversified behaviours in reaction, reflection, reflexion and constructive learning, there are higher probabilities that the agent performs reflexion in this test. As shown in Fig. 2, the state S8 that is associated with constructive learning is not used in this experiment. It is discovered that there exists only one terminal state S9, which is related to grounding via weight adaptation. The probability for reflexion (with the value 0.67) is comparatively high compared to other transition probability.

Fig. 2. The Markov state transition diagram for the agent in Test 2.

In Table 4, the state diagram in Fig. 2 is further discussed. Table 4. A detailed discussion of associated behaviours in Test 2

Time-series Association


Behaviour Descriptions and Causalities for associations

S1 S2 0.33 The agent senses (S), perceives (P) the environment stimuli and activates its experience (in Ia). This is an internal reaction related process in relation to a familiar environment context.

S1 S6 0.67 The agent reflexes (S+P S+P+Rex) to an environment context. This indicates that the agent has a very strong experience for that context.

S2 S3 1.00 The agent activates its experience and selects a concept. Ia and C1 have a strong dependence that is caused by it internal reaction process in relation to a familiar environment stimulus.

S3 S4 0.50 The agent selects a concept (in C1) to react and then observes the environmental changes in order to validate (in Vd) that concept. This is also an internal reaction and reflection related process.

S3 S7 0.40 The agent hypothesizes (in H) sometimes after the focused or the refocused concepts (in C1) fail to validate. This is a reflective related mechanism.

S3 S9 0.10 This is a new behaviour dependence compared to those in Test 1. The agent grounds refocused concept (in Wa) and learns new conceptual knowledge from induction (in C3).

S4 S5 0.50 The agent re-activates (in Ir) its experience when an existing experience is not able to validate (in Vd). This is a reflection related process.

S4 S9 0.50 The agent reinforces the validated experience (in Wa) and induces new conceptual knowledge (in C3) to refresh its

experience. This is the grounding related mechanism when an experience (reactive or reflective) is proved to be useful in interactions (in Vd).

S5 S3 0.56 Once the agent re-activates its experience (in Ir), it re-focuses on a new concept (in C1). This is the agent’s reflection related processes when it is not able to validate its activated experience in reaction.

S5 S9 0.44 The agent reinforces the re-activated experience (in Wa) and induces new knowledge (in C3). This is also a grounding related process in which the agent’s reflective experience is proved to be useful (in Ir).

S6 S4 0.50 This is a new behaviour dependence compared to those in Test 1. The agent validates its reflexive experience.

S6 S9 0.50 The agent reflexes (in Rex) and reinforces the reflexive experience (in Wa). This shows that the agent’s reflexive experiences are always get grounded.

S7 S5 1.00 The agent makes a hypothesis (in H) and re-activates its experience based on the hypothesis (in Ir). This is a strong dependence that results from the agent’s internal reflective processes in relation to a new or confused environment context.

Based on the state diagrams from Fig. 2 and detailed

discussions from Table 4, the following conclusions are reached:

There is a primary path with the highest transition probabilities for reflexive behaviours (with the main stream S1 S6 and S1 S6 S4). The experience for linear programming and quadratic programming are highly grounded such that the agent reflexes rather than using its experience structures to react;

There is a considerable high probability that the system performs reflective behaviours in this test (with the paths S3 S7 S5 and S3 S4 S5). The agent uses its experience to reflect to environmental changes. Considering the agent encounters the same sequence it has experienced in Test 1, the reason that the agent still reflects is caused by its experiential changes;

New behaviour dependences (S6 S4 and S3 S9) emerge due to the agent’s different experience and environment context it faces in this test.

There are some states (S8, S10 and S11) and behaviour dependences (S1 S8, S3 S10, S4 S10, S1 S11, S1 S1) missing in this test compared to those in Test 1. The lack of S8 and S11, which are related to constructive learning, demonstrates that the system does not face a new context in this test. S10 is the grounding by weight adaptation process which occurs at the initial stage of Test 1, when there is not enough perceptual data that can be used to induce conceptual knowledge.

In summary, the agent is more likely to reflex as some parts

of its experience are highly grounded. New behaviour dependencies emerge because agent with new experience behaves differently in the same environment. Even though

S2

S3

S4

S5

S6

S7

S9

S1

0.33 1.00 0.50

0.67 0.401.00

0.50

0.56

0.50

0.44

0.50

0.50

0.10

S2

S3

S4

S5

S6

S7

S9

S1

0.33 1.00 0.50

0.67 0.401.00

0.50

0.56

0.50

0.44

0.50

0.50

0.10


exposed to the same environment, a situated agent produces different behaviours.

2.5. The Markov analysis for test 3

Test 3 enables us to study changes imposed by the environment from another perspective. This test treats Test 1 followed by Test 2 as a single test. The combined data set contains 11 states with 137 state transitions. The state transition diagram is demonstrated in Fig. 3, in which the probability changes compared to those in Test 1, in terms of increase and decrease, are shown by arrows that follow these probabilities.

The trend of state changes of the agent is caused by the agent’s experiential grounding in Test 2, which increased the probabilities for reflexive behaviours and reflective behaviours at the expenses of the other behaviours, such as reactive behaviours and grounding behaviours. This shows that the agent can develop experience in Test 2 and therefore adapt its behaviours based on its experience.

Fig. 3. The state diagram generated from the data obtained in Test 3.

3. The Markov Analysis of the Macro Behaviours of the Situated Agent

This section describes experiments that enable us to understand the behaviours of this situated agent at a macroscopic level. Functional aggregations of detailed micro behaviours are investigated. The Markov states can be reduced to 4 states, which represent macro behaviours “Ra”, “Rf”, “Rx” and “Kc. Unlike the previous detailed behaviour analysis which focused on investigating the statistical distribution among tasks, this test examines the system’s behaviours over time here. For example, how the agent changes its behaviours across tasks. The results are obtained from 8 tests:

1. Test 1, as described before, consists of 15 design tasks of

{L, Q, Q, L, NL, Q, NL, L, L, NL, Q, Q, L, L, L} with a quadratic design experience;

2. Test 2 (also mentioned before), which is the same sequence of tasks as Test 1, but with a different initial agent experience (obtained from Test 1);

3. Test 3 (also mentioned before) has a combined data set from Test 1 and Test 2”;

4. Test 4, which use the same agent with the initial single experience of quadratic function to run through another sequence {Q, L, Q, L, NL, Q, NL, Q, Q, Q, Q, Q, L, Q, NL};

5. Test 5 examines the agent obtained from Test 4 and use the same sequence of Test 4;

6. Test 6 has combined data set from Test 4 and Test 5; 7. Test 7 uses the sequence of Test 4 to examine the agent after

Test 1. It is different to Test 5 which uses an agent that has learned from Test 1. The agent used in this test holds experience that has been learned from Test 2.

8. Test 8 has combined data obtained from Test 1 and Test 7”. Table 5 shows the macro behaviours obtained from these

tests.

Table 5. The macro behaviours obtained from tests, “T” is for Tasks and “B” is for macro behaviours

Test 1 T L Q Q L NL Q NL L L NL Q Q L L L B Kc Ra Ra Ra Kc Ra Ra Ra Ra Ra

Rf Kc

Ra Rf

Ra Rf

Ra Ra Rx

Test 2 T L Q Q L NL Q NL L L NL Q Q L L L B Rx Ra

Rf Ra Rf

Rx Ra Rf

Ra Ra Rf

Rx Rx Rx Rf

Rx Rx Rx Rx Rx


Rf Kc

Ra Rf

Ra Rf

Ra Ra Rx

T L Q Q L NL Q NL L L NL Q Q L L L B Rx Ra

Rf Ra Rf

Rx Ra Rf

Ra Ra Rf

Rx Rx Rx Rf

Rx Rx Rx Rx Rx

S2

S3 S4 S

5

S6 S7

S8

S9

S10

S1

S11

0.44_0.20_ 1.00 0.640.36_

0.07_

0.28_

0.03_

0.05_

0.28_

1.00

0.50

0.53_0.04_ 0.09_

0.05_

0.55_

0.40_

0.04_

0.45_

S2

S3 S4 S

5

S6 S7

S8

S9

S10

S1

S11

0.44_0.20_ 1.00 0.640.36_

0.07_

0.28_

0.03_

0.05_

0.28_

1.00

0.50

0.53_0.04_ 0.09_

0.05_

0.55_

0.40_

0.04_

0.45_


Test 4 T Q L Q L NL Q NL Q Q Q Q Q L Q NL B Ra Kc Ra Ra Kc Ra Rx

Rf Kc

Rx Rx Rx

Rx Rx Ra Rx Ra Rf

Test 5 T Q L Q L NL Q NL Q Q Q Q Q L Q NL B Rx Ra Rx Ra Ra

Rf Rx Rx

Rf Rx Rx Rx

Rx Rx Ra

Rf Rx Ra

Test 6 T Q L Q L NL Q NL Q Q Q Q Q L Q NL B Ra Kc Ra Ra Kc Ra Rx

Rf Kc

Rx Rx Rx

Rx Rx Ra Rx Ra Rf

T Q L Q L NL Q NL Q Q Q Q Q L Q NL B Rx Ra Rx Ra Ra

Rf Rx Rx

Rf Rx Rx Rx

Rx Rx Ra

Rf Rx Ra

Test 7 T Q L Q L NL Q NL Q Q Q Q Q L Q NL B Ra

Rf Rx Ra

Rf Rx Ra

Rf Ra Rx

Ra Rf

Rx Rx Rx

Rx Rx Rx Rx Ra Rf


Rf Kc

Ra Rf

Ra Rf

Ra Ra Rx

T Q L Q L NL Q NL Q Q Q Q Q L Q NL B Ra

Rf Rx Ra

Rf Rx Ra

Rf Ra Rx

Ra Rf

Rx Rx Rx

Rx Rx Rx Rx Ra Rf

3.1. Conclusions from first-order Markov state diagrams

As shown in Table 6, the Markov state diagrams for Tests 1 – 8 share some common features and, in the mean time, are related to tasks and its experience learned from the sequence of tasks.

Some interesting findings from Table 6 are:

1. “Ra” is strongly dependent on “Kc”. This shows that knowledge construction (Kc) is likely to be followed by reactive behaviour (Ra). But this is not necessarily a causal relation because it is related to how the agent responds to the environment, For example, in Test 1, the agent constructs new knowledge (Kc) in the Task “L” and subsequently reacts based on its initial experience in the Task “Q” (not the newly learned knowledge in the Task “L”).

Table 6. State diagrams for Tests 1 – 8

Test Number

Features First-order Markov State Diagrams

Test 1 (1)

Strong associations: 1. “Kc Ra” (1.00); 2. “Rf Ra” ( 0.67); 3. “Ra Ra” (0.59). shows how the agent interacts with the environment. The agent mainly perform “Ra” due to strong links to “Ra”.

1.00

Kc

Rx

Rf

Ra

0.25 0.67

0.59

0.33

0.08

0.08


Test 2 (1 after 1)

Strong associations: 1. “Ra Rf” (0.80); 2. “Rx Rx” ( 0.67); 3. “Rf Rx” (0.60). “Kc” is missing and the agent is focused on “Rf” and “Rx” with strong links to them.

Rx

Rf

Ra

0.80 0.40

0.20

0.11

0.22

0.67 0.60

Test 3 (1 and 1)

Strong associations: 1. “Kc Ra” (1.00); 2. “Rx Rx” ( 0.70); 3. “Rf Ra” (0.50). The weight of links to “Ra”, “Rf” and “Rx” are evenly distributed.

1.00

Kc

Rx

Rf

Ra

0.41 0.50

0.47

0.13

0.06

0.70

0.10

0.38

0.06

0.20

Test 4 (2)

Strong associations: 1. “Rf Kc” (1.00); 2. “Kc Ra” (0.67); 3. “Rx Rx” ( 0.57); The agent focuses on “Rx”, “Kc” and “Ra” due to strong links to them.

0.67

Kc

Rx

Rf

Ra

0.17

0.17

1.00

0.33

0.33

0.29

0.14

0.33

0.57

Test 5 (2 after 2)

Strong associations: 1. “Rf Rx” (1.00); 2. “Rx Rx” (0.50); 3. “Ra Rf” ( 0.50); The agent focuses on “Rx” and “Rf”.

Rx

Rf

Ra

0.50

0.25

0.10

0.40

0.50 1.00

0.25

Test 6 (2 and 2)

Strong associations: 1. “Rx Rf” (0.80); 2. “Kc Ra” (0.67); 3. “Rx Rx” ( 0.53); The weight of links to “Ra”, “Rf” and “Rx” are evenly distributed.

0.67

Kc

Rx

Rf

Ra

0.300.33

0.20

0.20

0.20

0.53

0.80

0.12

0.30

0.35


Test 7 (2 after 1)

Strong associations: 1. “Ra Rf” (0.83); 2. “Rf Rx” (0.67); 3. “Rx Rx” ( 0.60); The agent focuses on “Rx” and “Rf”.

Rx

Rf

Ra

0.83 0.33

0.40

0.600.67

0.17

Test 8 (1 and 2)

Strong associations: 1. “Kc Ra” (1.00); 2. “Rx Rx” (0.54); 3. “Rx Ra” ( 0.46); 4. “Ra Rf” ( 0.45); The weight of links to “Ra”, “Rf” and “Rx” are evenly distributed.

1.00

Kc

Rx

Rf

Ra

0.45 0.43

0.39

0.14

0.06

0.540.43

0.10

0.46

2. “Kc” is missing from all the follow-up tests (Tests 2, 5, 7),

in which the agent has already gained some experience from previous tests. This implies that the agent can use what has learned to react, reflect and reflex;

3. From Test 1 and follow-up tests based on it – Test 2 and Test 7, we find that the agent has higher transition probabilities in “Rx Rx” (“0.67” for Test 2 and “0.60” for Test 7) in the follow-up tests compare to that in Test 1 (“0”). On the other hand, the agent has higher transition probabilities in “Ra Ra” in Test 1 (“0.59”). The above observations imply different characteristics of the agent’s behaviours in the lead test and the follow-up tests.

4. The above-mentioned phenomena is not valid for Test 4 and the follow-up test (Test 5) based on it, in which both tests have higher transition probabilities in “Rx Rx” than in “Ra Ra”. Tasks of Test 4 allow the agent to reinforce its initial experience in quadratic programming and move into reflexive behaviour in Test 4.

3.2. The Markov analysis for second-order markov state diagrams for test 1

This section depicts the analysis of the second order Markov transition for Test 1 in order to allow us further understand the agent’s time-series behaviours. Task numbers, the related design optimization problems for each task and macro behaviours of the system are recorded in Table 7.

Table 7.

Tasks and macro behaviours obtained from Test 1

Task No.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Tasks L Q Q L NL

Q NL

L L NL Q Q L L L

Macro Behaviours

Kc Ra Ra Ra Kc Ra Ra Ra Ra Ra Rf Kc

Ra Rf

Ra Rf

Ra Ra Rx

The second-order Markov analysis is performed with these

time-series data. As indicated in Table 8, through the second-order Markov analysis, we can identify clusters (or chunks) of structures and their relationships over time. However, the causalities that drive these behaviour patterns need to be further investigated along with the problems the agent encountered and the agent’s experience.

Table 8. Second-order Markov transition matrix for agent in Test 1

Transition Kc Ra Rf Rx Kc Kc 0 0 0 0 Kc Ra 0 0.67 0.33 0 Kc Rf 0 0 0 0 Kc Rx 0 0 0 0 Ra Ra 0.14 0.58 0.14 0.14 Ra Kc 0 1.00 0 0 Ra Rf 0.33 0.67 0 0 Ra Rx 0 0 0 0 Rf Rf 0 0 0 0 Rf Kc 0 1.00 0 0 Rf Ra 0 0.50 0.50 0 Rf Rx 0 0 0 0 Rx Rx 0 0 0 0 Rx Kc 0 0 0 0 Rx Ra 0 0 0 0


Rx Rf 0 0 0 0

The patterns of behaviours according to their probabilities are

sorted in Table 9, followed by discussions of the underlying reasons

for these patterns. Table 9 A detailed discussion of second-order behaviour patterns in this test

Time-series Behaviour Patterns


Causal Relations inside the Pattern?

Behaviour Descriptions and Causalities

RaKc Ra

Task 456

1.00 No The agent performs Ra in task 4 (reacting to L), Kc in Task 5 (NL) and Ra in Task 6 (Q). This pattern is decided by environment context and how does the agent’s experience when it is exposed to these contexts.

Rf KcRa

Task 1011

1.00 No The agent reflects and constructs a new experience in Task 10 (Rf Kc) in relation to a “NL” problem. And in Task 11, the agent uses its previously accumulated experience to react to a “Q” problem.

Yes The agent learns a “L” in task 1 and reacts to a “Q” in task 2 with its initial experience. In task 3, the agent reacts to a “Q” using its grounded experience from task 2.

Causal relation in “RaRa”

Kc RaRa

Task 123

Task 567

0.67

Yes The agent learns a “NL” in task 5 and reacts to a “Q” using its experience. In task 7, it uses the learned experience of “NL” in task 5 to react to a “NL” problem.

Causal relation in “Kc… Ra”

Yes The agent reacts and reflects on a “Q” in task 11 then it uses this grounded experience in reacting to a “Q” in task 12.

Causal relation in “RaRfRa”

Ra RfRa

Task 1112

Task 1213

0.67

No The agent reacts and reflects on a “Q” in task 12 then it reacts to a “L” in task 13 using previously obtained experience in “L”. No causal relation in this pattern.

RaRaRa

Task 6…10

0.58 Yes The agent reacts to “Q” problem in task 6 to 10. There is causal relationship in “Ra RaRa”

In conclusion, some causal relationships in the first-order and

second-order Markov chains are identified. For example, a “Kc” is certainly followed by a “Ra” (probability “1.00” obtained from the first-order state diagram in Table 6). “Ra” is very likely (with

a probability “0.58”) followed by another two “Ra”s. There are patterns that cannot be explained using second-order Markov analysis simply because some behaviours are historically dependent and related to environment contexts – they are situated.

3.3. The Markov analysis for the agent’s behaviour in monotonic tasks

The multidimensionality4 of the task space introduces complexity in understanding a situated agent’s behaviours. We cannot deduce causal relationships between macro behaviours due to multiple tasks involved in the tests. It is necessary to investigate the agent’s behaviour in monotonic tasks, which means particular tasks over time, for example, the agent’s behaviours in linear programming (“L”) in a test. Samples are taken from data in Test 1 (described in Table 5) to create a monotonic task data, which is depicted in Table 10.

In a monotonic task, a situated agent’s behaviours are much more predictable. As shown in Table 10, the Markov state diagram for the agent in an “L” task exhibits a regular pattern, which is “Kc Ra Rx”. A situated agent learns via “Kc” and subsequently used the newly learned experience to react. After a number of exposures to “L”, the agent reflexes due to its highly grounded experience on linear programming.

The grounding of an experience in one particular design scenario has an effect of reducing the grounding of the agent’s experience in other design scenarios. As shown in Table 10, the agent initially learns new knowledge for design scenario “NL” and subsequently reacts to a similar problem in the environment. However, the agent also exhibits high transition probabilities in “Ra Rf (0.50)” and “Rf Kc (1.00)”.

Similar results can be deduced from the agent’s behaviour in the task “Q”. The agent mainly reacts using its original experience. But there are circumstances where the agent produces a reflection cue due to un-grounding of its experience.

Table 10. The Markov state diagram for Monotonic sample from Test 1

Tasks Task Name Markov State Diagram

Task “L”

Strong associations: 1. Kc Ra (1.00); 2. Ra Ra ( 0.80); 3. Ra Rx (0.20). This shows how the agent learns new knowledge and grounds that knowledge.

1.00

Kc

Rx

Ra

0.80

0.20

4 Multidimensionality refers to the number of design optimization problem types


Task “NL”

Strong associations: 1. Kc Ra (1.00); 2. Rf Kc ( 1.00); 3. Ra Ra (0.50); 4. Ra Rf (0.50);

The agent learns new

knowledge, and then react based on that knowledge. Its reflection leads to new knowledge construction.

1.00

Kc

Rf

Ra

0.50

0.50

1.00

Task “Q”

Strong associations: 1. Rf Ra (1.00); 2. Ra Ra (0.60); 3. Ra Rf (0.50);

The agent reacts based on initial experience on “Q”. There are

circumstances that it performs reflection.

Rf

Ra

0.40

0.60

1.00

4. Conclusion

This paper uses the Markov chain approach to analyze the behaviours of a situated agent in design optimization experiments. Time-series dependences between the agent’s behaviours are disclosed through a number of experiments and related analysis. As demonstrated in the micro behaviour analytical results, the dependences among the agent’s microscopic behaviours reflect how the agent’s internal processes in respond to what it confronts within its environment. Behaviour patterns and their dependences can be found from the macro behaviours analysis. Some can be explained from first-order and second order Markov analyses. Others can be traced back to higher-order relations in time, which result from what and how the agent learns its environments. The Markov analysis for Monotonic sample unveils causal relationships between a situated agent’s macro behaviours, which showcase that a constructive memory behaves as expected and that reasoning moves from reactive and reflective to reflexive as the agent acquires more similar experiences that are increasingly grounded.

These results show that there are structures and mechanisms that produce the agent’s state transitions. However, the uniformity of these transitions implies that a situated system is not stationary system, whose behaviours can be predicted. No hidden Markov states can be deduced. The factors behind situated behaviours depend on what has been experienced (past memories of the agent) in response to what are active in the environment at the time when the agent constructs a memory. A situated agent is an open and multi-dimension system, which can react, reflect and reflex depending on internal processed and its

interactions with the environment. It can be concluded that situated behaviours are history and process dependent, in the sense that the agent’s initial experience and the environment context from which the agent processes information shape how a situated agent behaves.

Acknowledgements

This research is supported by a grant from the Australian Research Council, grant number DP0559885.

References

Bartlett, F.C., 1932 reprinted in 1977. Remembering: A Study in Experimental and Social Psychology, Cambridge University Press, Cambridge.

Clancey, W., 1995. A tutorial on situated learning, in: Self J. (Eds.), Proceedings of the International Conference on Computers and Education. Charlottesville, VA, AACE, Taiwan, pp. 49-70.

Clancey, W., 1997. Situated Cognition: On Human Knowledge and Computer Representations, Cambridge University Press, Cambridge.

Dewey, J., 1896 reprinted in 1981. The reflex arc concept in psychology. Psychological Review 3, 357-370.

Gero, J.S., 2006. Understanding situated design computing: Newton, Mach, Einstein and quantum mechanics, Intelligent Computing in Engineering and Architecture (to appear).

Gero, J.S., 1999. Constructive memory in design thinking, in: Goldschmidt, G., Porter, W. (Eds.), Design Thinking Research Symposium: Design Representation. MIT, Cambridge, pp. 29-35.

Gero, J.S., 2003. Design tools as situated agents that adapt to their use, in: Dokonal, W., Hirschberg, U. (Eds.), eCAADe21. eCAADe, Graz University of Technology, pp. 177-180.

Gero, J.S., Fujii, H., 2000. A computational framework for concept formation in a situated design agent. Knowledge-Based Systems 13(6), 361-368.

Gero, J.S., Kannengiesser, U., 2006. A framework for situated design optimization, in: Leeuwen J.V., Timmermans, H. (Eds.), Innovations in Design Decision Support Systems in Architecture and Urban Planning. Springer, Berlin, pp. 309-324.

Gero, J.S., Smith, G.J., 2006. A computational framework for concept formation for a situated design agent, Part B: Constructive memory. Working Paper, Key Centre of Design Computing and Cognition, University of Sydney.

Lindblom, J., Ziemke, T., 2002. Social situatedness: Vygotsky and beyond, in: Prince, C.G., Demiris, Y., Marom, Y., Kozima, H. and Balkenius, C. (Eds.), Proceedings Second International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Edinburgh, Scotland, pp. 71-78.

Maher, M.L., Gero, J.S., 2002. Agent models of 3D virtual worlds, ACADIA 2002: Thresholds. California State Polytechnic University, pp. 127-138.

McClelland, J.L., 1981. Retrieving general and specific information from stored knowledge of specifics, Proceedings of the Third


Annual Meeting of the Cognitive Science Society. Erlbaum, Hillsdale, NJ, pp. 170-172.

McClelland, J.L., 1995. Constructive memory and memory distortion: A parallel distributed processing approach, in: Schacter D.L. (Eds.), Memory Distortion: How Minds, Brains, and Societies Reconstruct the Past. Harvard University Press, Cambridge, MA, pp. 69-90.

Peng, W., 2006. A Design Interaction Tool that Adapts, PhD Thesis. University of Sydney, Sydney.

Peng, W., Gero, J.S., 2006. Concept formation in a design optimization tool, in: Leeuwen J.V., Timmermans, H. (Eds.), Innovations in Design Decision Support Systems in Architecture and Urban Planning. Springer, Berlin, pp. 293-308.

Radford, A.D., Gero, J.S., 1988. Design by Optimization in Architecture and Building, Van Nostrand Reinhold, New York.

Siu, N., 1994. Risk assessment for dynamic systems: an overview. Reliability engineering & systems safety 43(11), 43-73.

Spears, W.M., 1998. A Compression Algorithm for Probability Transition Matrices. SIAM Matrix Analysis and Applications 20(1), 60-77.

Spears, W.M., 1999. Aggregating models of evolutionary algorithms, Proceedings of the Congress on Evolutionary Computation. IEEE Press, Washington D.C., USA, pp. 631-638.

Vygotsky, L.S., 1978. Mind in Society: The Development of Higher Psychological Processes, Harvard University Press, (Original work published in 1934), Cambridge, MA.

understanding behaviours of a situated agent: a markov chain...

Documents