representing systems with hidden state dorna kashef haghighi, chris hundt *, prakash panangaden,...

Representing Systems with Hidden State

Dorna KASHEF HAGHIGHI, Chris HUNDT*, Prakash PANANGADEN,

Joelle PINEAU and Doina PRECUP

School of Computer Science, McGill University(*now at UC Berkeley)

AAAI Fall Symposium Series

November 9, 2007

FSS 2007: Representing Systems with Hidden State

How should we represent systems with hidden state?

Partially Observable Markov Decision Processes (POMDP)• System is in some “true” latent state.• Perceive observations that depend probabilistic on the state.• Very expressive model, good for state inference and planning, but:

– Very hard to learn from data.

– Hidden state may be artificial (e.g. dialogue management).

Predictive representations (e.g. PSRs, OOMs, TD-nets, diversity)

• State is defined as sufficient statistic of the past, which allows predicting the future.

• Good for learning, because state depends only on observable quantities.

Our goal: Understand and unify different predictive representations.


Partially Observable Markov Decision Processes

• A set of states, S

• A set of actions, A

• A set of observations, O

• A transition function:

• An observation emission function:

• For this discussion, we omit rewards (may be considered part of the observation vector.)

€

δa (s,s') = P(st +1 = s' | st = s,at = a),∀a∈ A

€

γa (s,o) = P(ot +1 = o | at = a,st +1 = s),∀a∈ A


A simple example

• Consider the following domain: S={s1, s2, s3, s4}, A={N, S, E, W}

• For simplicity, assume the transitions are deterministic.

• In each square, the agent observes the color of one of the adjacent walls, O={Red, Blue}, with equal probability.

Question: What kinds of predictions can we make about the system?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.


A simple example: Future predictions

Consider the following predictions:

– If I am in state s1 and go North, I will certainly see Blue.

– If I go West then North, I will certainly see Blue.

– If I go East, I will see Red with probability 0.5.

– If I go East then North, I will see Red twice with probability 0.25.

The action sequences are experiments that we can perform on the system.

For each experiment, we can verify the predicted observations from data.




Tests and Experiments

• A test is a sequence of actions followed by an observation:

t = a1 … an o, n ≥ 1

• An experiment is a non-empty sequence of tests:

e = t1 …. tm, m ≥ 1

– Note that special cases of experiments are s-tests (Littman et al,

2002) and e-tests (Ruddary&Singh, 2004).

• A prediction for an experiment e starting in s S, denoted s|e, is the conditional probability that by doing the actions of e, we will get the predicted observations.


A simple example: Looking at predictions

Consider our predictions again:

– If I am in state s1 and go North, I will certainly see Blue.

s1 | NB = 1

– If I go West then North, I will certainly see Blue.

s | WNB = 1, s S

Note that for any sequence of actions preceding the West action, the

above prediction would still be the same.




Equivalence relations

• Two experiments are equivalent if their predictions are the same for every state:

e1 ~ e2 s | e1 = s | e2, s

Note: If two experiments always give the same results, they are redundant, and only one is necessary.

• Two states are equivalent if they cannot be distinguished by any experiment:

s1 ~ s2 s1 | e = s2 | e, e

Note: Equivalent states produce the same probability distribution over future trajectories, so they are redundant.


A simple example: Equivalent predictions

• Consider the following experiment: NRNR

– This is equivalent to : SRSR, NRSR, NNRSSSR, …

– This is an infinite equivalence class, which we denote by a chosen

exemplar: e.g. [NRNR]

– The predictions for this class: s1 | [NRNR] = 0

s2 | [NRNR] = 0.25




Dual perspectives

• Forward view: Given a certain state, what predictions can we make about the future?

– In classical AI, this view enables forward planning.

– It is centered around the notion of state.

• Backward view: Suppose that we want a certain experiment to succeed, in what state should the system initially be?

– This view enables backward planning.

– It is centered around the experiments.


A simple example: Dual perspectives

• Forward view:

Q: If we know that the system is in s1, what predictions can we make

about the future?




A simple example: Dual perspectives

• Backward view:

Q: Suppose we want the experiment NR to succeed, in what state

should the system be?

A: If the system starts either in state s2 or s4, the test will succeed

with probability 0.5.

• We can associate with the experiment NR a vector of predictions of

how likely it is to succeed from every state: [0 0.5 0 0.5]T




The dual machine

• The backward view can be implemented in a dual machine.

• States of the dual machine are equivalence classes of experiments [e].

• Observations of the dual machine are states from the original machine.

• The emission fn represents the prediction probability s | [e], s S.

• The transition fn is deterministic: [e] a [ae]


A simple example: A fragment of the dual machine



[NR] [NB]

[WR] [ER] [WB]N,S,E,W

N,S,E,W

N,SN,S

EEW W

N,S,E,W

γ(s1)= γ(s3)=0γ(s2)= γ(s4)=0.5

γ(s1)= γ(s3)=1γ(s2)= γ(s4)=0.5

γ(s) = 1γ(s) = 0.5γ(s) = 0

• This fragment of the dual machine captures experiments with 1 observation.

E.g. [NR] W [WR] because s | WNR = s | WR, s.

• There are separate fragments for experiments with 2 observations, 3 observations, etc.

Original: Dual:


Notes on the dual machine

• The dual provides, for each experiment, the set of states from which the experiment succeeds.

– Note that the emission function is not normalized.

– Given an initial state distribution, we can get proper probabilities Pr(s|[e]).

• Experiments with different numbers of observations usually end up in disconnected components.

• Arcs represent temporal-difference relations, similar to those in TD-nets (Sutton & Tanner, 2005).

– This is consistent with previous observations (Ruddary & Singh, 2004) that

e-tests yield TD-relationships and s-tests don’t.


Can we do this again?

• In the dual, we get a proper machine, with states, actions, transitions, emissions.

• Can we think about experiments on the dual machine?

– Repeat previous transformations on the dual machine.

– Consider classes of equivalent experiments.

– Reverse the role of experiments and states.

• What do we obtain?


The double dual machine

• States of the double dual machine are bundles of predictions for all possible experiments, e.g. [s]’ and [s]’

– Equivalence classes of the type [s]’ can be viewed as homing

sequences (Evan-Dar et al., 2005).

• The double dual assigns the same probability to any experiment as the original machine. So they are equivalent machines.

• The double dual is always a deterministic system! (But can be much larger than the original machine.)


A simple example: The double dual machine



Original:

[NR] [NB]

[WR] [ER] [WB]N,S,E,W

N,S,E,W

N,S

N,S

EEW W

N,S,E,W

γ(s1)= γ(s3)=0γ(s2)= γ(s4)=0.5

γ(s1)= γ(s3)=1γ(s2)= γ(s4)=0.5

γ(s) = 1γ(s) = 0.5γ(s) = 0

Dual:

S1 S2N,SN,S E

Wγ(NR)= 0γ(NB)= 1γ(ER)= 0.5γ(WB)= 1γ(WR)= 0 …

Double Dual:

γ(NR)= 0.5γ(NB)= 0.5γ(ER)= 0.5γ(WB)= 1γ(WR)= 0 …

Equivalent states are eliminated.

Two simple homing sequences:• Action W forces system into s1.• Action E forces system into s2.


Conjecture: Different representations are useful for different tasks

• Learn the double-dual

– Advantage: it’s deterministic.

– Problem: in general, the double-dual is an infinite representation.

(In our example, it’s compact due to deterministic transitions in the original.)

– Focus on predicting accurately only the result of some experiments.

• Plan with the dual

– For a given experiment, the dual tells us its probability of success from every

state.

– Given an initial state distribution: search over experiments, to find one with

high prediction probability with respect to goal criteria.

– Start with dual fragments with short experiments, then move to longer ones.


A simple learning algorithm

Consider the following non-deterministic automaton:

• A set of states, S

• A set of actions, A

• A set of observations, O

• A joint transition-emission relation:

Can we learn this automaton (or an equivalent one) directly from data?€

γ⊂ S × A × O × S :

s'∈ γ (s,a,o) if s ao ⏐ → ⏐ s'


Merge-split algorithm

• Define:

– Histories: h={a1, o1, a2, o2, …, am, om}

– The empty history: • Construct a “history” automaton, H.

• Algorithm:

– Start with one state, corresponding to the empty history, H = { }– Consider all possible next states, h’ = hao

– The merge operation checks for an equivalent existing state:

h’ ~ h” h’ = h”, where h is the set of all possible future

trajectories.

If found, we set the transition function accordingly: δ(h,ao)=h’’

– Otherwise the split operation is applied: H = H h’

δ(h,ao)=h’


Example





The flip automaton (Holmes&Isbell’06)

The learned automaton


Comments

• Merge-split constructs a deterministic history automaton.

• There is a finite number of equivalence classes of histories.

– Worse-case: size is exponential in the number of states in the original machine.

• The automaton is well defined (i.e. makes the same predictions as the original model.)

• This is the minimal such automaton.

• Extending this to probabilistic machines is somewhat messy…. but we are working on it.


Final discussion

• Interesting to consider the same dynamical system from different perspectives.

– There is a notion of duality between state and experiment.

– Such a notion of duality is not new.

E.g. observability vs controllability in systems theory.

• Large body of existing work on learning automaton, which I did not comment on. [Rivest&Schapire’94; James&Singh’05; Holmes&Isbell’06; …].

• Many interesting questions remain:

– Can we develop a sound approximation theory for our duality?

– Can we extend this to continuous systems?

– Can we extend the learning algorithm to probabilistic systems?

representing systems with hidden state dorna kashef haghighi, chris hundt *, prakash panangaden,...

Documents

state s1

hidden state tests

diversity state

state inference

s s note

true latent state

actions of e

experiment e