the markov chain monte carlo method isabelle stanton may 8, 2008 theory lunch

The Markov Chain Monte Carlo Method

Isabelle StantonMay 8, 2008 Theory Lunch

Monte Carlo vs Las Vegas

Las Vegas Algorithms are randomized and always give the correct results but gamble with computation time

Quicksort

Monte Carlo algorithms have fixed running time but may be wrong

Simulated Annealing Estimating volume

Markov Chains

a memoryless stochastic process, eg, flipping a coin

56

4

32

1

1/6

1/6

1/6

1/6

1/6

1/6

1/6

1/6 1/6

1/6

1/6 1/6

1/6

Other Examples of Markov Chains

Shuffling cards Flipping a coin PageRank Model Particle systems – focus of MCMC work

General Idea

Model the system using a Markov Chain Use a Monte Carlo Algorithm to perform some

computation task

Applications

Approximate Counting - # of solutions to 3-SAT or Knapsack

Statistical Physics – when do phase transitions occur?

Combinatorial optimization – simulated annealing type of algorithms

We'll focus on counting

Monte Carlo Counting How do you estimate the volume

of a complex solid? Render with environment maps

efficiently? Estimate an integral numerically?

(Picnic) Knapsack

Holds 20

weighs 4

weighs 10

weighs 4

weighs 2

weighs 5

What is a solution?How many solutions are there?

Counting Knapsack Solutions

Item weights: a = (a0,...a

n)

Knapsack size: a real number b Estimate the number of {0,1} vectors, x, that

satisfy a*x ≤ b

Let N denote the number of solutions

Naїve Solution

Randomly generate x Calculate a*x If a*x ≤ b return 2n

else return 0

This will return N in expectation: 0*(2n-N) + N*2n / 2n

Is this fast?

Counterexample: a = (1, ... 1) and b = n/3 Any solution has less than n/3 1's There are (n choose n/3)*2n/3 solutions

no

Pr(sample x, ||x|| ≤ n/3) < (n choose n/3)*2-2n/3

In expectation, need to generate 2n/3 x's before we get a single solution!

Any polynomial number of trials will grossly underestimate N

Knapsack with MCMC

Let Mknap

be a markov chain withstate space Ω(b) = {x | a*x ≤ b}

This will allow us to sample a solution

Various Mknap

000

001 010 100

011 101 110

111

a=(0,.5,.5) b = 1.5

a=(0,1,1) b = 1.5

001 010 100

110101011

000000

001 010 100

110101

Mknap

Transitions

Transitions With probability 1/2, x transitions to x Otherwise, select an i u.a.r.

from 0 to n-1 and flip

the ith bit of x.

If x' is a

solution,

transition there.

000

001 010 100

011 101 110

111

001 010 100

110101

000000

001 010 100

110101

a=(0,1,1) b = 1.5

0.5

0.5

0.50.5

0.50.5

1/6 1/61/6

1/61/6

1/61/6

Connected?

Is Mknap

connected?

Yes. To get from x to x' go through 0.

Ergodicity

What is the stationary distribution of Knapsack? Sample each solution with prob 1/N

A MC is ergodic if the probability distribution over the states converges to the stationary distribution of the system, regardless of the starting configuration

Is Mknap

ergodic? Yes.

Algorithm Idea

Start at 0 and simulate Mknap

for enough steps that the distribution over the states is close to uniform

Why does uniformity matter? Does this fix the problem yet?

The trick

Assume that a0 ≤ a

1 ... ≤ a

n (0,1,2,…,n-1,n)

Let b0 = 0 and b

i = min{b, Σia

j}

|Ω(bi-1

)| ≤ |Ω(bi)| - why?

|Ω(bi)| ≤ (n+1)|Ω(b

i-1)| - why?

Change any element of Ω(bi) to one of Ω(bi-1) by switching

the rightmost 1 to a 0

How does that help?

|Ω(b)| = |Ω(bn)| = |Ω(b

n)|/|Ω(b

n-1)| x

|Ω(bn-1

)|/|Ω(bn-2

)| x ... x |Ω(b1)|/Ω|(b

0)| x |Ω(b

0)|

We can estimate each of these ratios by doing a walk on Ω(b

i) and computing the fraction of

samples in Ω(bi-1

)

Good estimate since

|Ω(bi-1

)| ≤ |Ω(bi)| ≤ (n+1)|Ω(b

i-1)|

Analysis

Ignoring bias, the expectation of each trial is |Ω(b

i-1)|/|Ω(b

i)|

We perform t = 17ε-2n2 steps Focus on Var(X)/E(X)^2 in analyzing efficiency

for MCMC methods

Analysis

If Z is the product of the trials, E[Z] = П |Ω(b

i-1)|/|Ω(b

i)|

*Magic Statistics Steps* Var(Z)/(E[Z])2 ≤ ε2/16 By Chebyshev's:

Pr[(1-ε/2)|Ω(b)| ≤ Z ≤ (1+ε/2)|Ω(b)| ] ≥ 3/4

Analysis

We used nt = 17ε-2n3 steps This is a FPRAS (Fully Polynomial Randomized

Approximation Scheme) Except... what assumption did I make?

Mixing Time

Assumption: We are close to the uniform distribution in 17ε-2n2 steps

This is known as the mixing time It is unknown if this distribution mixes in

polynomial time

Mixing Time

What does mix in polynomial time? Dice – 1 transition Shuffling cards – 7 shuffles ferromagnetic Ising model at high temperature –

O(nlog n) What doesn't?

ferromagnetic Ising model at low temperature – starts to form magnets

Wes Weimer Memorial Conclusion Slide

The markov chain monte carlo

method models the problem

as a Markov Chain and then

uses random walks Mixing time is important P# problems are hard Wes likes trespassing

the markov chain monte carlo method isabelle stanton may 8, 2008 theory lunch

Documents

nn n

x n3 n

x blet n

x transitions

x b1b0 x b0we

element of bi

2n3 solutionsnoprsample

mcmclet mknap