approximate inference for complex stochastic processes: parametric & nonparametric approaches

Approximate Inference for Complex Stochastic Processes: Parametric & Nonparametric

Approaches

Brenda Ng & John Bevilacqua16.412J/6.834J Intelligent Embedded Systems

10/24/2001

OverviewProblem statementGiven a complex system with many state variables that evolve over time, how

do we monitor and reason about its state?– Robot localization and map building– Network for monitoring freeway traffic

Approach1. Representation: Model problem as a Dynamic Bayesian Network2. Inference: Approximate inference techniques • Exact inference on approximate model

Parametrized approach: Boyen-Koller Projection• Approximate inference on exact model

Nonparametrized approach: Particle Sampling

Contribution

Reduce complexity of problem via approximate methods, rendering the problem of monitoring complex dynamic system tractable.

What is a Bayesian Network?A Bayesian network, or a belief network, is a graph in which

the following holds:• A set of random variables makes up nodes of the network.• A set of directed links connects pairs of nodes to denote causality

relations between variables.

• Each node has a conditional probability distribution that quantifies the effects that the parents have on the node.

• The graph is directed and acyclic.

Courtesy of Russell & Norvig

Why Bayesian Networks?• Bayesian networks achieve compactness

by factoring the joint distribution into local, conditional distributions for each variable given its parents.

• Bayesian networks lend easily to evidential reasoning.

x1

x3x2

x4

x5

Season

RainSprinkler

Wet

Slippery

Dynamic Bayesian NetworksDynamic Bayesian networks capture the process of

variables changing over time by representing multiple copies of state variables, one for each time step.

• A set of variables X denotes the world state at time t and a set of sensor variables E denotes the observations available at time t.

• Keeping track of the world means computing the current probability distribution over world states given all past observations, P(Xt|E1,…,Et).

• Observation model P(Et|Xt) and transition model P(Xt+1|Xt)

Courtesy of Koller & Lerner

A vi's ca teg oriza tion fo r ap p roxim ate in fe ren ce a lg orith m s

S am p lin gM eth od s

S earchM eth od s

L oop yP rop ag ation

A p p roxim ate com p u ta tionon exac t m od e l

V aria tion a lm eth od s

M in ib u cke ts B oyen -K o lle rm eth od fo r D B N

P ro jec tion

E xac t com p u ta tionon ap p roxim ate m od e l

A p p roxim ate in fe ren ce a lg orith m s

Why Approximate Inference?The variables in a DBN can become correlated very quickly as the network is unrolled. Consequently, no decomposition of the belief state is possible and exact probabilistic reasoning methods is infeasible.

Monitoring Task I

Posterior distribution:

Belief state at time t-1: {t-1)[si]

State evolution model: T

Prior distribution:

Observation at time t

Observation model: O

Belief state at time t: {t )[si]

Monitoring Task II

Posterior distribution:

Belief state at time t: {t)[si]


Observation at time t+1


Belief state at time t+1: {t+1 )[si]

Prior distribution: (t•)

Boyen-Koller ProjectionAlgorithm• Decide on some computationally tractable representation for an

approximate belief state, e.g. one that decomposes into independent factors.

• Propagate the approximate belief state at time t through the transition model and condition it on our evidence at time t+1.

• In general, the resulting state for time t+1 will not fall into the class which we have chosen to maintain.

• Continue to approximate the belief state using one that does and continue.

Assumptions• T is ergodic for error to be bounded.• The approximate belief state must be decomposable.

Approximation Error • Gradual accumulation of approximation errors• Spontaneous amplification of error due to instability

Monitoring Task (Revisited)


Observation at time t+1


Approximate belief state at time t: {t)[si]~

(t•) Prior distribution: ^ ~

Posterior distribution: ^ ^^

Approximate belief state at time t+1: {t+1)[si]~

Belief state approximation: Project (t+1•) ^

Approximation Error

Approximation error results from two sources:• old error “inherited” from the previous

approximation (t) • new error derived from approximating (t+1) using

(t+1) Suppose that each approximation introduces an error

of , increasing the distance between the exact belief state and our approximation to it.

How is the error going to be bounded?

~

~

^

error

t )

t )

Point of convergence

Idea of Contraction• To insure that the error is bounded means T and O must

reduce the distance between the two belief states by a constant factor.

Distance Measure

If and are two distributions over the same space , the relative entropy of and is:~

Contraction ResultsThese contraction results show that the approximate belief state and the true belief state are driven closer to each other at each propagation of the stochastic dynamics.

Thus, the BK approximation method never introduces unbounded error!BK is a stable approximation method!

Rao-Blackwellised Particle Filters

• Sampling-based inference/learning algorithms for Dynamic Bayesian Networks

• Exploit the structure of a DBN by sampling some of the variables and marginalizing out the rest in order to increase the efficiency of particle filtering

• Lead to more accurate estimates than standard Particle Filters

Particle Filtering

1) Resample Particles

2) Propagate according to action a1

3) Reweight according to observation zt+1

bt=P(st| ht)

P(st|ht)

P(st+1|at,ht)

bt=P(st+1| ht+1)Ht+1= zt+1,at,ht

Rao-Blackwellization

Conditions for using RBPFs• The system contains discrete states and discrete observations.• The system contains variables which can be integrated out

analytically.

Algorithm• Decompose dynamic network via factorization.• Identify variables whose value can be discerned by observation

of the system and other variables.• Perform particle filtering on these variables and compute

relevant sufficient statistics.

Example of Rao-Blackwellization

• Have a system with variables time, position, velocity

• Realize that, given the current and previous position and time, velocity can be determined

• Remove velocity from system model and perform particle filtering on position

• Based on sampled value for position, calculate the distribution for the velocity variable

Application: Robot Localization IRobot Localization & Map Building Scenario

A robot can move on a discrete, 1D grid. Objective

To learn a map of the environment, represented as a matrix that stores the color of each grid cell.

Sources of Stochasticity• Imperfect sensors• Mechanical faults (motor failure & wheel slippage)

The robot needs to know where it is to learn the map, but needs to know the map to figure out where it is.

Application: Robot Localization II

A. Exact inference B. RBPF with 50 samples C. Fully-factorized BK

Summary of results• RBPF is able to provide near the same accuracy of estimation as exact inference.• BK gets confused because it ignores correlations between the map cells.

Application: BAT Network I

Analysis procedures• Process is monitored

using the BK/sampling method with some fixed decomposition or some fixed number of particles.

• Performance is compared with the result derived from exact inference. BAT network used for

monitoring freeway traffic

BAT Results: BK

• The errors obtained in practice is significantly lower than that predicted by the theoretical analysis.

• This algorithm works optimally when clusters are chosen to correspond to structure of weekly correlated subprocesses.

• Accuracy can be further improved by using conditionally independent approximations.

• Evidence boosts contraction and significantly reduce the overall error of our approximation. (Good for likely evidence, bad for unlikely evidence).

• By exploiting structure of a process, this approach can achieve orders of magnitude faster inference with only a small degradation inaccuracy.

BAT Results: Sampling

• Error drops sharply initially and then the improvements become smaller and smaller.

• The average KL-error changes dramatically over the sequence. The spikes correspond to unlikely evidence, in which case the samples become less reliable. This suggests that a more adaptive sampling scheme may be advantageous.

• Particle filtering gives a very good estimate of the true distribution over the variable.

BK vs. RBPF

• Boyen-Koller projection– Applicable for inference on networks with

structure that lends easily to decomposition– Requires transition model to be ergodic

• Rao-Blackwellized particle filters– Applicable for inference on networks with

redundant information– Applicable for difficult distributions

Contributions

• Introduced Dynamic Bayesian Networks and explained their role in inference of complex stochastic processes

• Examined two different approaches to analyzing DBNs– Exact Computation on an approximate model: BK– Approximate Computation on an exact model: RBPF

• Compared performance and applicability of these two approaches

approximate inference for complex stochastic processes: parametric & nonparametric approaches

Documents