approximate counting, uniform generation and rapidly ...sinclair/approx.pdf · counting and discuss...

41
INFORMATION AND COMPUTATION 82, 93-133 (1989) Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains* ALISTAIR SINCLAIRAND MARK JERRUM Department of Computer Science, University of Edinburgh, Edinburgh EH9 3JZ, Scotland The paper studies effectiveapproximate solutions to combinatorial counting and uniform generation problems. Using a technique based on the simulation of ergodic Markov chains, it is shown that, for self-reducible structures, almost uniform generation is possible in polynomial time provided only that randomised approximate counting to within some arbitrary polynomial factor is possible in polynomial time. It follows that, for self-reducible structures, polynomial time randomised algorithms for counting to within factors of the form (1 +n-@) are available either for all fl E R or for no fi E R. A substantial part of the paper is devoted to investigating the rate of convergence of finite ergodic Markov chains, and a simple but powerful characterisation of rapid convergence for a broad class of chains based on a structural property of the underlying graph is established. Finally, the general techniques of the paper are used to derive an almost uniform generation procedure for labelled graphs with a given degree sequence which is valid over a much wider range of degrees than previous methods: this in turn leads to randomised approximate counting algorithms for these graphs with very good asymptotic behaviour. 0 1989 Academic Press, Inc. 1, INTRODUCTION This paper is concerned with two classes of problems involving a finite set of combinatorial structures: counting them and generating them uniformly at random. Combinatorial counting problems have a long and distinguished history. Apart from their intrinsic interest, they arise naturally from investigations in numerous other branches of mathematics and the natural sciences and have given rise to a rich and beautiful theory. Uniform generation problems are less well studied but have a number of computational applications. For example, they can be seen as a way of exploring a large set of structures and constructing typical representatives of it. These may * An extended abstract of this paper appeared in the “Proceedings of the 13th International Workshop on Graph-Theoretic Concepts in Computer Science, Staffelstein, June/July 1987”; published by Springer-Verlag as Lecture Notes in Computer Science Vol. 314. 93 0890-5401/89 53.00 Copyright 0 1989 by Academic Press. Inc. All rights of reproduction in any form reserved.

Upload: others

Post on 29-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

INFORMATION AND COMPUTATION 82, 93-133 (1989)

Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains*

ALISTAIR SINCLAIR AND MARK JERRUM

Department of Computer Science, University of Edinburgh, Edinburgh EH9 3JZ, Scotland

The paper studies effective approximate solutions to combinatorial counting and uniform generation problems. Using a technique based on the simulation of ergodic Markov chains, it is shown that, for self-reducible structures, almost uniform generation is possible in polynomial time provided only that randomised approximate counting to within some arbitrary polynomial factor is possible in polynomial time. It follows that, for self-reducible structures, polynomial time randomised algorithms for counting to within factors of the form (1 +n-@) are available either for all fl E R or for no fi E R. A substantial part of the paper is devoted to investigating the rate of convergence of finite ergodic Markov chains, and a simple but powerful characterisation of rapid convergence for a broad class of chains based on a structural property of the underlying graph is established. Finally, the general techniques of the paper are used to derive an almost uniform generation procedure for labelled graphs with a given degree sequence which is valid over a much wider range of degrees than previous methods: this in turn leads to randomised approximate counting algorithms for these graphs with very good asymptotic behaviour. 0 1989 Academic Press, Inc.

1, INTRODUCTION

This paper is concerned with two classes of problems involving a finite set of combinatorial structures: counting them and generating them uniformly at random.

Combinatorial counting problems have a long and distinguished history. Apart from their intrinsic interest, they arise naturally from investigations in numerous other branches of mathematics and the natural sciences and have given rise to a rich and beautiful theory. Uniform generation problems are less well studied but have a number of computational applications. For example, they can be seen as a way of exploring a large set of structures and constructing typical representatives of it. These may

* An extended abstract of this paper appeared in the “Proceedings of the 13th International Workshop on Graph-Theoretic Concepts in Computer Science, Staffelstein, June/July 1987”; published by Springer-Verlag as Lecture Notes in Computer Science Vol. 314.

93 0890-5401/89 53.00

Copyright 0 1989 by Academic Press. Inc. All rights of reproduction in any form reserved.

Page 2: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

94 SINCLAIR AND JERRUM

be used to formulate conjectures about the set, or perhaps as test data for the empirical analysis of some heuristic algorithm which takes inputs from the set.

The study of counting problems as a class from a computational perspec- tive was initiated by Valiant (1979a). A parallel approach to generation problems was proposed more recently by Jerrum, Valiant, and Vazirani (1986). In this paper, we improve and extend some results of the latter authors concerning the relationship between counting and generation problems, and in particular the existence of efficient approximation algorithms for their solution.

Typically, the sets of structures we encounter are defined implicitly by some other combinatorial entity drawn from a family of problem instances, together with a relation R which associates with each instance x a finite set R(x) of solutions, as in the following examples:

1. Problem instances: Boolean formulae B in disjunctive normal form (DNF). So!ution set R(B): all satisfying assignments of B.

2. Problem instances: undirected graphs G. Solution set R(G): ail l-factors (perfect matchings) of G.

3. Problem instances: positive integers n. Solution set R(n): all partitions of n.

Thus we can talk about the counting and (uniform) generation problems associated with a relation R: given as input a problem instance x, count or generate the elements of the solution set R(x).

Many naturally occurring relations of this kind exhibit a self-reducibility property, first studied by Schnorr (1976). Informally, this means that the solutions in R(x) have a simple inductive construction from the solution sets of a few smaller instances of the same problem. (For precise definitions of terms in this Introduction, the reader is referred to the next section.) All of the above examples are self-reducible: in 1, for instance, there is an obvious (l-l )-correspondence between the satisfying assignments of B and those of the reduced formulae BT and B,, which are obtained from B by substituting for one of its variables the values true and false respectively.

For self-reducible relations, an efficient (i.e., polynomial time) algorithm for the counting problem immediately yields a polynomial time uniform generation procedure. This approach has been used extensively in the literature to generate particular combinatorial structures, such as 3 above, for which counting information is readily available, typically in the form of a recurrence relation (see, e.g., Guenoche, 1983; Nijenhuis and Wilf, 1978; Wilf, 198 1).

Efficient exact solutions to counting problems are, however, relatively uncommon: indeed there are many natural relations, among them 1 and 2

Page 3: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 95

above, whose counting problem is #P-complete and hence apparently intractable (Valiant, 1979a, 1979b), but whose construction problem can be solved in polynomial time. (Given a problem instance x, the construction problem asks for a solution y E R(x) if one exists, and the answer “no” otherwise.) In some cases, such as 1 above, the structures can also be generated efficiently. Note that generation seems to be harder in general than construction since it requires that all solutions be “equally accessible.”

In circumstances where exact methods are elusive, it is natural to seek efficient procedures for counting structures approximately in some appropriate sense. Following Stockmeyer (1983) and Karp and Luby (1983), we allow our counting algorithms to flip coins, and demand that they produce an answer which approximates lR(x)[ within some specified factor with high probability.

It turns out that, for self-reducible relations, the problems of approximate counting and almost uniform generation, in which a small bias in the output distribution over R(x) is allowed, are closely related. (An almost uniform generator will generally be as useful in practice as a uniform one and may be effectively indistinguishable from it.) Specifically, Jerrum, Valiant, and Vazirani (1986) show how the standard reduction from generation to exact counting can be modified so as to yield an almost uniform generator given only approximate counting estimates, provided these are within a factor of 1 + 0(1x1 pkR) of the correct values, where 1x1 is the input size and k,>O is a constant depending on R. Conversely, approximate counting to within any factor of the form 1 + 1x1 -p, with j? E R, is polynomial time reducible to almost uniform generation. They also locate these two problems for general NP relations within the second level of the (probabilistic) polynomial time hierarchy (Stockmeyer, 1977).

In this paper, we present an improved polynomial time reduction from almost uniform generation to approximate counting for self-reducible relations, in which the counting estimates need only be within a factor of 1 + O( 1x1”) of the exact values, for an arbitrary real constant ~1. Thus a very crude counting procedure (to within a constant factor, say) can be used to generate solutions almost uniformly, and so can in turn be bootstrapped in polynomial time to a counting procedure which approximates within a fac- tor of 1+ E for any specified E > 0. Moreover, the runtime of the improved procedure depends only polynomially on E- *. (Such an algorithm is often called a fully polynomial randomised approximation scheme.) A remarkable consequence of this result is that the concept of approximate counting to within factors of the form 1 + U( 1x1 -p), for fi E R, is robust with respect to polynomial time computation for the large class of self-reducible relations.

The novel reduction is accomplished by stochastic simulation of an ergodic Markov chain whose structure mirrors the self-reducibility of the relation in question. The states of the chain include the solutions of

Page 4: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

96 SINCLAIR AND JERRUM

interest, and as the chain evolves it converges to a stationary distribution which is uniform over these states. Therefore, provided the convergence is rapid enough, a modest number of simulation steps will ensure that the final state is almost uniformly distributed over the solution set. A similar approach, based on a rather different type of Markov chain, can be used to generate in more direct fashion various structures such as matchings in graphs. This is the subject of another paper (Jerrum and Sinclair, 1988; see also Sinclair, 1988).

As a stepping stone to the above result, we derive a simple charac- terisation of rapid convergence, in a suitably defined sense, for a broad class of finite ergodic Markov chains in terms of a structural property of the underlying weighted graph. This characterisation, which is related to recent work by Alon (1986) on eigenvalues and expander graphs, appears to be quite generally applicable and we believe it to be of independent interest. Further examples of its use appear in (Jerrum and Sinclair, 1988; Sinclair, 1988). Very recently, a similar characterisation for Markov chains was discovered independently by Lawler and Sokal (1988).

Finally, as a concrete example of these ideas in action we consider the problem of generating labelled graphs with specified vertex degrees and a specified excluded subgraph. Using a result of McKay (1985) which provides analytic counting estimates for these graphs, we show that it is possible to generate them in polynomial time from a distribution which is very close to uniform provided only that the maximum degree grows no faster than O(HZ”~), where m is the number of edges. Although the problem is apparently not self-reducible under this restriction, our techniques can still be applied with a little extra work. This result represents a considerable improvement on hitherto known methods (Bollobas, 1980; Wormald, 1984). It also implies the existence of polynomial time randomised algorithms for counting such graphs to within a factor of 1 + m -@, for any desired /PER. Since the approach here is quite general, it seems likely that other natural combinatorial counting and generation problems can be treated in similar fashion.

The remainder of the paper is organised as follows. In Section 2 we introduce some definitions and notation. Section 3 covers the charac- terisation of rapid convergence for Markov chains and is largely self- contained. In Section 4 we use this characterisation to establish the improved reduction from almost uniform generation to approximate counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section 4 to the degree constrained graph problem mentioned above.

Page 5: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATECOUNTING 97

2. DEFINITIONS AND NOTATION

This section is devoted to establishing a formal framework for the notions mentioned in the Introduction.

Let ,Z be a fixed finite alphabet in which the combinatorial structures of interest are to be encoded, and let R 5 C* x C* be a binary relation over 2. For each string (problem instance) XE C* the corresponding solution set with respect to R is

R(x)= {~EC*: (x, y)eR}.

We shall always assume that these sets are finite. Note that we make no distinction between strings which do not encode a “valid” problem instance and those which encode a problem instance with empty solution set. Thus the formal counterpart of Example 1 of the Introduction is

R = { (x, y ): x E Z* encodes a Boolean formula B in DNF

y E C* encodes a satisfying assignment of B}.

Throughout we shall move freely between the formal and informal problem descriptions, assuming always that the encoding scheme used is “reasonable” in the sense of Garey and Johnson (1979).

The counting problem for a relation R over C involves computing the function #R: Z* + N defined by #R(x)= lR(x)l. As indicated in the Introduction, we shall be concerned with effective approximate solutions to this problem which estimate the value of the function within a specified factor. This notion of approximation, which is familiar from combinatorial optimisation and asymptotic analysis, has also been applied to counting problems in computer science by Stockmeyer (1983) and Karp and Luby (1983). (A less conventional and much more severe definition of approximate counting is studied by Cai and Hemachandra, 1986.)

If a, li, and r are non-negative real numbers with r 2 1, we say that ri approximates a within ratio r if cir-’ < a < 6r. Let R be a relation over Z‘, and p a real-valued function of the natural number n such that p(n) > 1 for all n E N. A randomised approximate counter for R within ratio p is a probabilistic algorithm V whose output on input x E ,5’* is a non-negative real-valued random variable g(x) satisfying

Pr(V?(x) approximates #R(x) within ratio p( 1x1)) B 1,

If % is in fact deterministic then it is an approximate counter for R within ratio p. In either case, %? is polynomial/y time-bounded if it runs within time p( [xl) for some polynomial p and all inputs XE C*.

The significance of the lower bound of $ in the above definition lies in the

Page 6: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

98 SINCLAIR AND JERRUM

fact that it allows the counter to be “powered” so that the probability of producing a bad estimate becomes very small in polynomial time. (This would still hold if a were replaced by any fixed constant greater than t.) More precisely, we have

LEMMA 2.1. If there exists a polynomially time-bounded randomised approximate counter V for R within ratio p, then there exists a probabilistic algorithm W which on inputs (x, S > EC* x R + runs in time polynomial in 1x1 and lg 6-‘, and whose output is a random variable %“(x, 6) satisfying

Pr((e’(x, 6) approximates #R(x) within ratio p( Ix])) > 1 - 6.

Proof The required procedure W’ makes p(lg 6-l) calls to %, with input x, for a suitable polynomial p and returns the median of the values obtained. For the details, see Lemma 6.1 of (Jerrum et al., 1986). 1

In the (uniform) generation problem for a relation R G C* x C*, we are given an input x EC* and asked to select an element of R(x) at random in such a way that each solution has equal a priori probability of being chosen. In practice, the strict uniformity requirement can generally be weakened slightly, and we say that a probabilistic algorithm 93 is an almost uniform generator for R if its output on ‘inputs (x, E) EC* x R + is a random variable $3(x, E) satisfying

(i) Y(x, E) takes values in the set R(x) u {?} with ? # C, and

R(x) # @ + Pr(g(x, E) = ?) < 4.

(ii) There exists a function 4: C* x R+ -+ (0, 1 ] such that, for all YEC*,

y 4 R(x) == Pr(%(x, E) = y) = 0

y~R(x)=>(l+~)-‘&x,~)<Pr(%(x,~)=y)d(l+~)~(x,~).

Thus E represents the pointwise bias tolerated in the output distribution. An almost uniform generator is fuZiy polynomial (Ep.) if it runs in time bounded by a polynomial in 1x1 and lg E ~ ’ : in this case, the generator may be regarded as effectively indistinguishable from a uniform one by polynomial time statistical tests. Note that as a matter of convenience we allow the generator to fail (i.e., output the special symbol ?) with probability 64. This can always be made to decay exponentially fast using repeated trials; moreover, if the construction problem is solvable in polynomial time, as is often the case for the relations we consider, then we can force the generator never to fail when R(x) is non-empty.

Page 7: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 99

Remark. In the above definitions we have not been precise about our model of randomised computation. For deliniteness, we take this to be the probabilistic Turing machine of Gill (1977), in which the only source of randomness is a fair coin. However, we shall feel free to express algorithms in terms of much more general branching probabilities involving the ratio of two previously computed integers.. This behaviour can always be simulated by a fair coin to a high degree of accuracy at negligible extra cost. In particular, the notions of polynomial time approximation algorithm presented here are robust with respect to such changes in the model of computation.

Next we formalise the concept of self-reducibility described in the Introduction. A relation R EC* x C* is (polynomial time) self-reducible, in the sense of Schnorr (1976), if

(i) There exists a polynomial time computable length function I,: C* + N such that lR(x) = 0( I.#“) for some constant k, > 0, and

Y E R(x) * IYI = ER(x) vx, y E P.

(ii) For all XE Z* with ZR(x) = 0, the predicate ,4 E R(x) can be tested in polynomial time. (A denotes the empty string over C.)

(iii) There exist polynomial time computable functions +: C* x Z* -+ 2’ and 0: C* + N satisfying

4x) = mIxI) f,(x) > 0 - o(x) > 0 VXEC*

Ill/k w)l d I.4 vx, WEE* ~AII/(x, ~1) = max{I,(x) - 14, O} vx, WEE*

and such that each solution set can be expressed in the form

R(x) = u {WY: Y E WV%, w,,> W’E p(r) Condition (iii) provides an inductive construction of the solution sets as

follows: if the solution length lR(x) is greater than zero, R(x) can be par- titioned into classes according to the small initial segment w of length a(x), and each class can then be expressed as the solution set of another instance $(x, w), concatenated with w. The partitioning of satisfying assignments of a DNF formula indicated in the Introduction is easily seen to be of the required form, under some natural encoding. An atom is an instance x E .Z* with solution length I,(x) = 0: in the above example, these would include (encodings of) the constants true and false, viewed as DNF formulae. Con-

Page 8: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

100 SINCLAIR AND JERRUM

dition (ii) says that, for atoms x, we can test in polynomial time whether R(x) = 0 or R(x) = {A}. Note that this, together with condition (iii), implies that we can test whether a candidate solution y E C* belongs to any solution set R(x) in time polynomial in 1x1 + Jyl. In view of condition (i), the existence problem associated with R is therefore in NP. (R is called a p-relation in Jerrum et al., 1986.) It appears that the vast majority of naturally occurring relations can be formulated so as to be self-reducible.

It is conceptually helpful to capture the inductive construction of solutions of a self-reducible relation explicitly in a tree structure. For each XEC* with R(x) # 0, the tree of derkations T,(x) is a rooted tree in which each vertex v bears both a problem instance label inst(v) E C* and a partial solution label sol(v) E .Z*, defined inductively as follows:

(i) The root u has labels inst(u) = x and sol(u) = A. (ii) For any vertex v in TR(x), if the problem instance z = inst(u) is

an atom then u is a leaf. Otherwise, define

W(u) = (w E .F’(“: R(ll/(z, w)) # 0).

(Note that W(v) is non-empty.) Then v has a child v, for each w E W(v), with labels inst( v,) = ll/(z, w) and sol( v,.) = sol(u) . w.

Note that the labels sol(v) are distinct, while the inst(v) are in general not. It should be clear that the labels sol(v) for leaves v are precisely the elements of R(x), so there is a (Ill)-correspondence between leaves and solutions. More generally, for any vertex v of Z’,(x) there is a (l-l)- correspondence between the solution set R(inst(v)) and the leaves of the maximal subtree rooted at v. The bounds on (T and + in the definition of self-reducibility ensure that the depth of TR(x) is bounded by lR(x) = O((xlk”), and that the number of children of any vertex is also polynomially bounded.

In order to infer the structure of the tree of derivations, it is clearly necessary to solve the existence problem for the relation in question. Since we will not always be able to do this with certainty, it is useful to define the self-reducibility tree pR(x) as above except that the restriction R(@(z, w)) # 0 in the definition of W(v) is removed. Obviously FR(x) contains TR(x) as a subgraph and their labels agree. All solutions in R(x) still occur precisely once as labels of leaves of p,(x), but there may be other leaves whose partial solution labels are not in R(x). The depth and vertex degree of FR(x) remain polynomially bounded as before.

Most known uniform generation algorithms for combinatorial structures (see, e.g., Nijenhuis and Wilf, 1978) may be viewed as instances of the following generic reduction to the corresponding counting problem. Given

Page 9: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 101

that the structures are described by a self-reducible relation R, select a random path from the root of the tree of derivations to a leaf (solution), at each stage choosing the next edge with probability proportional to the number of solutions in the maximal subtree rooted at its lower end. This information may be obtained from a counter which evaluates the function #R for appropriate problem instance labels in the tree. By appending a correction process based on the a posteriori probability of the path, this procedure can be made to work even if the counter is slightly inaccurate, specifically if it is within ratio 1 + O(nekR), where k, > 0 is a constant satisfying iR(x) = O(lxlk”) (see Jerrum et al., 1986, and also Bach, 1983). Furthermore, if the counter is randomised then a Ep. almost uniform generator is still obtained. (In the latter case we have to work with the self- reducibility tree rather than the tree of derivations.) Since a f.p. almost uniform generator can itself be used to construct a polynomially time- bounded randomised approximate counter within ratio 1 -t K@ for any desired p E R, counters within the threshold ratio 1 + O(n -““) can be bootstrapped to achieve arbitrarily good asymptotic behaviour (Jerrum et al., 1986).

When rather cruder counting information is available (to within a constant factor, say) the above “one-pass” technique breaks down owing to the accumulation of errors which are too large to be corrected. We will therefore adopt a more flexible, self-correcting approach in which a random process moves dynamically around the tree, with backtracking allowed. The generator we will construct in Section 4 views the vertices of the tree of derivations as the states of a Markov chain ,&Y?(x) in which there is a non-zero transition probability between two states iff they are adjacent in the tree. The transition probabilities themselves are computed with the aid of the crude approximate counter. Clearly all states com- municate, so that, leaving aside questions of periodicity, if the chain is allowed to evolve for t steps from any initial state the distribution of its final state approaches a unique stationary distribution as t + co. Now suppose that this distribution is uniform over the leaves of the tree. Then we get an almost uniform generator by simulating the chain for sufficiently many steps starting at (say) the root and, if the final state is a leaf, outputting the corresponding solution.

The efficiency of this procedure will of course depend crucially on the rate of convergence of the chain. In particular, since the size of the tree is in general exponential in 1x1, we require the chain to be very close to stationarity after visiting only a small fraction of its states. There appear to be no quantitative results in the literature which would readily provide useful analytic bounds on the rate of approach to stationarity in this case. Accordingly, in the next section we develop a characterisation of rapid convergence, in a suitably defined sense, for a broad class of Markov

Page 10: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

102 SINCLAIR AND JERRUM

chains. This will enable us to show in Section 4 that the almost uniform generation procedure sketched above is in fact fully polynomial.

Remark. The Markov chain approach to almost uniform generation also points to a fundamentally different strategy which appeals neither to self-reducibility nor to counting. Here transitions are made more or less directly between solutions by means of local perturbations, in a manner suggested by Broder (1986) and familiar from Monte Carlo studies in statistical physics (Binder, 1976). We can also consider generating solutions from more general distributions by making appropriate adjustments to the stationary distribution of the chain. The machinery developed in the next section can be used to analyse Markov chains of this kind as well: for applications, the reader is referred to (Jerrum and Sinclair, 1988; Sinclair, 1988).

3. MARKOV CHAINS AND RAPID MIXING

We assume that the reader is familiar with the elementary theory of finite Markov chains in discrete time: an introduction can be found in (Feller, 1968, Chap. XV). First, we establish some’ terminology and quote some basic facts.

Let the sequence of random variables (X,),““_, be a time-homogeneous Markov chain on a finite state space [N] = (0, 1, . . . . N- 1 }, NB 1, with transition matrix P = (pii)&\. (All Markov chains in this paper will be assumed to be of this form.) Thus for any ordered pair i, j of states the quantity pij = Pr(X, + , = j ( X, = i) is the transition probability from state i to state j and is independent of t. The matrix P is non-negative and stochastic, i.e., its row sums are all unity. For SEN, the s-step transition matrix is simply the power P” = (PC)); thus p$) = Pr(X, + s = j 1 X, = i), independent of t. We denote the distribution of X, by the row vector rc(‘)‘= (rc~~))~&‘, so that rr!‘) = Pr(X = i) Here f . II (O)’ denotes the initial distribution, and II(‘)’ = kCo)‘Pr for all t E N. Usually we will have rrj”) = 1 for some iE [IV] (and 0 elsewhere); i is then called the initial state.

The chain is ergodic if there exists a distribution R’ = (n,) > 0 over [ZV] such that

lim p!?) = 71. I/ J Vi, jE [IV]. s - a

In this case, we have that R(‘)’ = aco”P’ -+ II’ pointwise as t + co, and the limit is independent of rc . (O)’ The stationary distribution II’ is the unique vector satisfying a’P= tr’, Ci rri = 1, i.e., the unique normalised left eigen- vector of P with eigenvalue 1. Necessary and sufftcient conditions for ergodicity are that the chain should be (a) irreducible, i.e., for each pair of

Page 11: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 103

states i, Jo [N], there is an s EN such that p:) > 0 (j can be reached from i in a finite number of steps); and (b) aperiodic, i.e., gcd(s: p!) > 0} = 1 for all i, jE [N].

Suppose now that we wish to sample elements of the state space, assumed very large, according to the stationary distribution rr’. This problem arises frequently in the mathematical modelling of physical systems, where states correspond to configurations of the system and appropriate functions of the stationary process to physical constants or parameters (Binder, 1976), and is also fundamental to stochastic optimisation techniques such as simulated annealing (Kirkpatrick, Gellatt, and Vecchi, 1983). In the applications we have in mind here, some of the states can be identified with certain combinatorial structures of interest and rr’ is uniform over these states. However, our approach will address the general problem.

The desired distribution can be realised by picking an arbitrary initial state and simulating the transitions of the Markov chain according to the probabilities pii, which we assume can be computed locally as required. As the number t of simulation steps increases, the distribution of the random variable X, will approach rc’. Clearly, for this process to be effective it is necessary to know a priori how many steps are required to achieve a dis- tribution sufficiently close to R’ for our purposes, or in other words to have some bound on the rate of convergence of the chain. As a time-dependent measure of deviation from the limit, we define the relative pointwise distance (r.p.d.) over a non-empty subset US [N] after t steps by

Au(t)=max W-71il.

i,js U 7cI

Thus d U (t) gives the largest relative difference between rr(‘)’ and rr’ at any state Jo U, maximised over all possible initial states iE ZJ.’ The inclusion of the parameter U merely allows us to specify that certain portions of the state space are not relevant in the sampling process, as will prove helpful later. The aim of this section is to obtain a useful upper bound on A, as a function of t. In particular, we want to investigate conditions under which convergence is rapid in the sense that A CN3(t) becomes very close to 0 while t 4 N: this is sometimes referred to as the “rapid mixing” property (Aldous, 1981).

A number of techniques for investigating the rate of convergence of

1 This measure, which is a symmetrical version of the separation distance defined in (Aldous and Diaconis, 1986) has been chosen by analogy with our definition of almost uniform generation in Section 2. We could alternatively have used a measure based on the variation distance, namely d;(r) = maxiE u x,Ipt’ - nj[jl. F or most interesting chains, this choice makes no essential difference to the rapid convergence criterion.

Page 12: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

104 SINCLAIR AND JERRUM

Markov chains have recently been proposed by other authors. Methods based on coupling (Aldous, 1981) and stopping times (Aldous and Diaconis, 1986) are attractive and yield tight bounds for simple chains, such as random walks on hypercubes and various card-shuflhng processes. However, the analysis involved appears to become extremely complex in more interesting cases where the chain lacks a highly symmetrical structure. The approach used here based on the eigenvalues of the transition matrix is more classical, but seems hitherto to have been of little practical value. Our contribution is to develop from it a simple yet powerful tool for obtaining good analytic bounds for a broad class of chains. Crucially, we will be able to apply this tool to chains which have not proved amenable to analysis by other means.

An ergodic Markov chain is said to be time-reversible if either (and hence both) of the following equivalent conditions holds:

(i) For all i, jE [TV], pii~i=pjjnj. (ii) The matrix D”2PD-1’2 is symmetric, where D’12 is the diagonal

matrix diag(7chi2, . . . . 7crp i) and D- ‘I2 is its inverse.

Condition (i) says that in the stationary distribution the expected numbers of transitions per unit time from state i to state j and from state j to state i are equal, and is usually called the “detailed balance” property. As we shall see, time-reversible chains are particularly susceptible to detailed analysis, and for this reason play a major role in applications where a rigorous quantitative treatment is necessary (see, e.g., Keilson, 1979).

It is illuminating to identify an ergodic time-reversible chain with a weighted undirected graph containing self-loops as follows. The vertex set of the graph is the state space [NJ of the chain, and for each pair of states i, j (which need not be distinct) the edge (i, j) has weight wii= nipii= njpji. By detailed balance this definition is consistent. Thus there is an edge of non-zero weight between i and j iff pii > 0. We call this graph the underlying graph of the chain. It should be clear that such a chain is uniquely specified by its underlying graph.

As already stated, the stationary distribution rr’ of an ergodic chain is a left eigenvector of P with eigenvalue 1, = 1. Let { &: 1 < i < N- 1 }, with Ai E C, be the remaining eigenvalues (not necessarily distinct) of P. By stan- dard Perron-Frobenius theory for non-negative matrices (Seneta, 198 I), these satisfy (Ai1 < 1 for 1 < i< N- 1. Furthermore, the transient behaviour of the chain, and hence its rate of convergence, is governed by the magnitude of the eigenvalues 2;. In the time-reversible case, condition (ii) of the definition implies that the eigenvalues of P are just those of the similar symmetric matrix D’l’PD - ‘I’, and so are all real. This fact leads to a clean formulation of the above dependence.

Page 13: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATECOUNTING 105

PROPOSITION 3.1. Let P be the transition matrix of an ergodic time- reversible Markov chain, 11’ its stationary distribution and (Ai: 0 6 i 6 N - 1 } its (necessarily real) eigenvalues, with A,, = 1. Then for any non-empty subset US [N] and all t E N, the relative pointwise distance A,(t) satisfies

A.(t)< 4nax mini, U rcj’

where A,,,, =max{l&(: l<i<N-l}.

Proof: Let D ‘I* and D ~ ‘I2 be as in the definition of time-reversibility, so that the matrix A = D”*PD- ‘I* is symmetric with the same eigenvalues as P, and these are real. Hence we can select an orthonormal basis (e”“:O<i<N-l) for RN consisting of left eigenvectors of A, where e(j)’ = (ej!‘)) has associated eigenvalue Izi and e,!‘) = rc,!/* for Jo [N].

Following (Keilson, 1979) A has the spectral representation

N-l N-1 A = c Aie(i)e(W =

I=0 ;To JiE(“,

where EC’) = e”‘e”)’ is a dyad (i.e., has rank 1) with E(‘)E(j) = 0 for i # j, and ECi)* = EC’). It follows that, for any t E N, A’ = xi A: EC’), and hence

N-I p’= D-l/*A’D’/* = 1 l;(D- l/*e(‘))(e(‘)‘D”*)

i=o N-I

= 1 No’ + c ~f(D~1/2e(i))(e(i)‘D1’2), i-1

where 1 N is the N-vector all of whose entries are 1; in component form,

By definition, the r.p.d. A,(t) is therefore given by

where the second inequality follows from the Cauchy-Schwarz inequality

Page 14: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

106 SINCLAIR AND JERRUM

(Kolmogorov and Fomin, 1970, p. 38) and the orthonormality of the e”‘. 1

Proposition 3.1 says that a time-reversible Markov chain will be rapidly mixing in the sense indicated earlier provided that a’ is not extremely small in any state of interest and that A,,.,,, is bounded away from 1. The first of these conditions can be checked immediately from our knowledge of n’ and is rarely violated in practice. We therefore focus our attention on the second condition, which is not so easily handled. (Note that P is a large matrix so that direct numerical evaluation of the eigenvalues is not feasible.)

Suppose the eigenvalues of P are ordered so that 1 = 1, > I, > . > A,- i > - 1. Then the value of A,,, is governed by 1, and A,-, , the latter being significant only if some of the eigenvalues are negative. Negative eigenvalues correspond to oscillatory, or “near-periodic” behaviour and cannot occur if each state is equipped with a sufficiently large self-loop probability. Specifically, it is enough to have min, P,~ B +. To see this, let I, denote the Nx N identity matrix and consider the non-negative matrix 2P - IN, whose eigenvalues are pi = 2A; - 1. By Perron-Frobenius, pi > - 1 for all iE [N], which implies that A,,-, > 0.

In fact, negative eigenvalues never present an essential obstacle to rapid mixing because any chain can be modified in a simple way so that the above condition holds without risk of slowing down the convergence too much:

PROPOSITION 3.2. Let P be the transition matrix of an ergodic time- reversible Markov chain, and 1 = & > 1, > . . > 1, _ , > - 1 its eigenvalues. Then the modified chain with transition matrix P’ = +(I, + P) is also ergodic and time-reversible with the same stationary distribution, and its eigenvalues {A:}, similarly ordered, satisfy Ah_, > 0 and Ai,, = A’, = $(l + A,).

From the above discussion, it is sufficient for rapid mixing to bound the second eigenvalue A, away from 1. We shall do this by relating A1 to a more accessible structural property of the underlying graph.

Intuitively, we would expect an ergodic chain to converge rapidly if it is unlikely to “get stuck” in any subset S of the state space whose total stationary probability is fairly small. We can formalise this idea by con- sidering the cut edges which separate S from the rest of the space in the underlying graph, and stipulating that these must be capable of supporting a sufficiently large “flow” in the graph, viewed as a network. With this in mind, for any non-empty subset S of states with non-empty complement s in [N] we define the quantity Gs= Fs/Cs, where

Page 15: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 107

c,= 1 71, itS

the capacity of S;

Fs= 1 Pqni the ergodicflow out of S. ie.S

/ES

Note that 0 <F, < C, < 1. Qs may be visualised as the conditional probability that the stationary process crosses the cut from S to S in a single step, given that it starts in S. Finally, we define the global conduc- tance of the chain by

It is easy to see that F,= Fs for all such sets S. This implies that Gs= GsCs( 1 - C,)-‘, so we may equivalently write

@= min max(@,, Qs} O<ISI<N

Now suppose that the chain is time-reversible, and let G be its under- lying graph. Then for all S as above we have

Fs=F,-= 1 wii, its its

a function of the edge weights of G. The conductance @ E Q(G) may then be viewed as a structural property of the weighted graph G. In view of the above remarks, we might hope that Q(G), which in some sense measures the minimum relative connection strength between “small” subsets S and the rest of the space, has some bearing on the rate of convergence of the chain. This relationship is manifested via a bound on the second eigenvalue 1 i .

LEMMA 3.3. For an ergodic time-reversible Markov chain with underlying graph G, the second eigenvalue I, of the transition matrix satisfies

Proof. Let e’ = (ei)y=Wo’ be an eigenvector of P with associated eigen- value ?, < 1, and define the matrix Q = IN - P (the “Laplace operator” associated with P). Then clearly

e’Q = (1 - A)e’. (1)

Page 16: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

108 SINCLAIR AND JERRUM

Define the subset of states S= {in [NJ: ei > O}. It is easy to check that, since P is stochastic and A< 1, xi ei = 0. Hence 0 < 1 SI < N, and we may assume without loss of generality that C, = xi, s ni d 4. Now let 6’ = (Ci) be the vector defined by

tic ei/ni, for iES; 0, otherwise.

Renumbering states as necessary, we shall assume that to 2 t, 2 . . > t, _ I, which implies also that S = (0, 1, . . . . r ) for some r with 0 < r < N - 1.

Taking the scalar product of (1) with 6’ gives

(e’Q, 2’) = (1 - A)(e’, e’). (2)

The right-hand side of (2) is just

(1-A) c “$f. (3) ieS

Note that if Q= (qv) then qii= -pii for i#j, and qii= 1 -pii=cj+ipii, SO we can expand the left-hand side of (2) as

C C :iqjieja 1 C tiqjie, ieSje[N] isSjcS

= - zs zs wii@igj + zs jsi wiip?

j#i

= -2 1 w,&i:j+ 1 wii(Pf +6/q

i-cj icj

= 1 w,i(ci-eTj)2, (4) icj

where the inequality follows from the fact that all contributions with j$ S are positive. Using (3) and (4), Eq. (2) therefore yields

1 -A,Ci(i wij(ci-gj)2 ,

CieS7ciPf .

Now consider the sum

(5)

c wi,(c+~j)2,<2 c w,(f?f+$)<2 1 7#. iii i<J res

Page 17: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 109

Combining this with (5) gives

1-A,Zi<jwij(ei-6j)’ ~~i<jwij(‘i+~j)*

’ ~icsniP; 2 CiE s I@;

>’ (

~i<jwij(E:-~;) 2 ‘2 CiES7tie^f ) ’ (6)

where we have used the Cauchy-Schwarz inequality. To complete the proof, we need to relate the quotient in (6) to the quantity Q(G). To do this, consider the increasing sequence (S,); =0 of subsets of S with Sk = { 0, . . . . k}. The numerator of the quotient in (6) may be expressed in terms of ergodic flows across the boundaries between successive sets Sk as follows:

i < j

= k;. (3 - 3 + 1) *x.sk wii iCSk

(7)

Now the capacities of the Sk satisfy C, d Cs < f for 0 6 k < r, and hence by definition of 0, F, 2 Q(G) Csk. We therefore get from (7)

i-cj k=O

=@(G) i (S;-;;+,) i TC, k=O i=O

=@(G) i xi i (+C:+,) i=O k=i

=@(G) 1 TC+?:. iCS

This inequality ensures that the quotient in (6) is bounded below by D(G), so that finally

as required. 1

Page 18: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

110 SINCLAIR AND JERRUM

Combining Propositions 3.1 and 3.2 and Lemma 3.3 we arrive at the main result of this section, which says that the number of steps required for an ergodic time-reversible Markov chain to lose its memory (approach stationarity) is 0(@(G)-2 lg(l/n ,,,)), where G is its underlying graph and nmin the minimum stationary probability of any state. Thus, under mild assumptions about the stationary distribution, convergence is rapid if Q(G) is not too small.

THEOREM 3.4. Let G be the underlying graph of an ergodic time-rever- sible Markov chain, modified iSnecessary as in Proposition 3.2 to ensure that min, pii > 4, and A’ its stationary distribution. Then for any non-empty subset U c [N] and all t E N, the relative pointwise distance A,(t) satisfies

A (t) < (1 - @(G)*P) u 1 minicu 7ri ’

Remarks. (a) Theorem 3.4 has a converse which states that, under the same assumptions, A,,,(t) > (1 - 20(G))’ (Sinclair, 1988). Hence we effec- tively have a characterisation of rapid mixing for time-reversible chains in terms of the graph-theoretic quantity @. (Note however that this does not cover cases in which rci is extremely small for some i. Such chains may exhibit a range of convergence behaviour regardless of the value of @b(G).)

(b) In the interest of simplicity, we have appealed to the rather crude device of Proposition 3.2 for eliminating negative eigenvalues: the effect of this operation on the conductance is to reduce it by a factor of t. In prac- tice it may often be possible to reason about negative eigenvalues on an ad hoc basis for the chain at hand. Proposition 3.1 and Lemma 3.3 may then be used directly to get a bound on A,(t).

(c) Lemma 3.3 and its proof parallel an earlier continuous result of Cheeger (1970) for Riemannian manifolds. In the discrete setting, the lemma and its converse are closely related to recent work of Alon (1986) and Alon and Milman (1985) (see also Dodziuk, 1984) in which a relationship between a similar structural property of simple, unweighted graphs and the second eigenvalue of the adjacency matrix is established. This property, called the magnification, measures the minimum number of vertices adjacent to a small subset S as a fraction of ISI, and is a generalisation of the widely studied concept of expansion for bipartite graphs. Our conductance @ is a weighted edge analogue of magnification, and is the natural quantity to study in the present application. The significance of Alon’s result as a sufficient condition for rapid mixing for certain Markov chains has been noted by several authors; in particular, Aldous (1987) states a restricted form of Theorem 3.4 for random walks on

Page 19: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 111

regular graphs. Our characterisation based on the conductance seems to provide a cleaner and more natural formulation of this connection. Very recently, Lawler and Sokal (1988) have independently discovered results similar to those presented in this section but in a rather more general context.

Theorem 3.4 allows us to investigate the rate of convergence of a time- reversible chain by examining the structure of its underlying graph. For rapid mixing, this will typically involve deriving bounds of the form @P(G) = Q((lg N))k) as the number N of states increases, for some constant k. The exciting feature of this characterisation is that, with a bit more work, suitable conductance bounds may actually be derived analytically for a number of interesting chains. In this way we are able for the first time to establish the rapid mixing property for chains which lack a high degree of symmetry and which have not proved amenable to analysis by other methods. In the next section, we show how this can be done for the chain based on the tree of derivations mentioned at the end of the last section. Further applications may be found in Jerrum and Sinclair (1988) and Sinclair (1988).

4. REDUCTIONS

We return now to the main theme of this paper, namely the construction of an efficient almost uniform generator for a self-reducible relation given only very approximate counting information.

Let R SC* x C* be self-reducible, and x E Z* a problem instance with R(x) # a. As advertised in Section 2, our aim is to set up an ergodic Markov chain Aw(x) whose states are the vertices of the tree of derivations TR(x) and whose stationary distribution is uniform over the leaves of the tree.

The chain is based on an elaboration of the standard reduction from uniform generation to exact counting indicated at the end of Section 2. We may view the counter in this reduction as assigning to each edge of the tree of derivations an integer weight equal to the number of leaves in the sub- tree rooted at its lower end; the process is then a transient Markov chain in which the transition probabilities from any vertex (state) to its children are proportional to the corresponding edge weights. Suppose now that the process is no longer constrained to move downwards but may also backtrack from any vertex to its parent, the transition probabilities to all adjacent vertices being proportional to the edge weights: thus from any internal vertex upward and downward movements are equally likely. To eliminate periodicity, we add to each state a self-loop probability of -&

643/82/l-8

Page 20: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

112 SINCLAIR AND JERRUM

Viewing this process as a symmetric random walk with reflecting barriers on the levels of the tree, it is easy to see that it converges rapidly (essentially in time polynomial in the depth of the tree) to a stationary distribution which is uniform over levels and also uniform over leaves. Hence a short simulation of the chain generates leaves almost uniformly, and the probability of failure can be made small by repeated trials. Now suppose that we have available only an approximate counter for R, so that the edge weights in the tree are no longer accurate. Then we have grounds for optimism that this procedure might still work efficiently: the hope is that, since each edge weight influences transitions in both directions, the process will actually be self-correcting.

Suppose then that we are given a polynomially time-bounded approximate counter G9 for R within ratio p(n) = 1 + O(n’) for some c1 E R. Thus the error ratio in %?’ need not even be constant, but may increase polynomially with the problem size. Note first that, since R is self- reducible, %’ can always be modified so as to give an exact answer (which will be either 0 or 1) when its input is an atom; also, its output may always be rounded up to the nearest integer at the cost of adding at most 1 to p. We shah assume throughout this section that %’ incorporates these modifications. We may also assume without loss of generality that p is monotonically increasing. To begin with, we shall consider the case where @ is deterministic; the additional technical problems posed by randomised counters will be dealt with later.

For a problem instance x as above, let V, E be the vertex and edge sets respectively of TR(x), and set rn = IR(x), r = p( 1x1). Note that both m and r are polynomially bounded in 1x1, and that the depth of the tree is at most m. For each edge (u, v) E: E define the quantity

if u is the parent of u; otherwise. @I

(Recall that inst( .) gives the problem instance associated with any vertex of the tree.) Since 59 is deterministic, f: E + N + is a well-defined function on E. The crucial property to bear in mind is that for any edge P E E, f(e) approximates within ratio r the number of leaves in the maximal subtree below e.

Next we define for each vertex u G V a degree

d(v)= 1 f(U, u). (9) u:(u.c)r E

Note that d(u) 2 1 for all 21 E V, and that 4 v) = 1 if v is a leaf because %? is

Page 21: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 113

exact for atoms. For each ordered pair u, u of vertices, the transition probability p,, from v to u is then defined to be

I

f(u, ovwv), if (24, v)EE;

pcu = l/2, if u = 0; (10)

0, otherwise.

Thus there is a non-zero transition probability between two states iff they are adjacent in the tree. The self-loop probability f ensures that the chain is aperiodic. It is clearly also irreducible, and hence ergodic, and it is a simple matter to verify that the stationary distribution rc’ = (E,),~ &, is proportional to the degrees, i.e.,

40) 7rIL,=- D

VVE v, (11)

where D = C,, ,, d(u). Let us first check that sampling from V according to the distribution rc’

does in fact give us an efficient generation procedure for R, so that the approach of the previous section applies. Since leaves of the tree corre- spond to solutions, while other vertices must necessarily correspond to failure of the generator, we have to verify that I’ is uniform and sufficiently large over the leaves. (Recall that an almost uniform generator must have bounded failure probability.) Uniformity follows directly from the fact that d(u) = 1 for all leaves u, so x, = l/D. That this is not too small is a consequence of the following lemma.

LEMMA 4.1. In the stationary distribution of A%?(x), the probability of being at a leaf is at least 1/2rm.

ProoJ Observe that the degree sum D over the tree T,(x) may be written

D= 1 d(u)=2 1 f(e). L’ t v <’ t E

Now consider the collection of edges at some fixed level of the tree. By (8) the weight of each such edge approximates within ratio r the number of leaves in the maximal subtree rooted at its lower end. Since these subtrees are disjoint, the aggregated weight of all edges at this level is at most r #R(x). Summing over all levels of the tree yields the bound

D = 2 1 f(e) d 2rm #R(x). etE

(12)

Page 22: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

114 SINCLAIR AND JERRUM

Since 71, = l/D for each leaf u, the stationary probability of being at a leaf is

as claimed. 1

Recall that m and r are each polynomially bounded in 1x1, so the failure probability can be reduced to $ by repeating the entire experiment only polynomially many times.

We now adress the trickier question of the rate of convergence of the chain A%(x), assuming for the moment that an efficient stepwise simulation from some initial state can be performed. It is easy to see that the chain is time-reversible by virtue of detailed balance.2 We will therefore try to estimate the conductance 0 of its underlying graph, as defined in the previous section, and appeal to Theorem 3.4.

LEMMA 4.2. Let G be the underlying graph of the Markov chain A?‘%(x) defined above. Then the conductance of G satisfies

Q(G) > (4r2m)-‘.

Proof: Note first that in G each edge (u, u) E E has weight w,, =f(u, v)/20, while the loop at u has weight d(u)/2D and all other edges have weight zero. In what follows, we will identify subsets of V with the subgraphs of TR(x) which they induce. If Ss V is a subtree (connected subgraph) of TR(x), we let root(S) denote the vertex of S at minimum distance from root(V), the root of T,(x).

In order to bound the conductance of G, we claim that it suflices to consider flows out of all subtrees S with root(S) # root(V). (Informally, the process will converge fast because it is quite likely to emerge from any such subtree, travelling upwards, within a small number of steps.) To see this, note first that Q(G) 3min Qs, where the minimisation is over all non- empty subsets SE V with root( V) # S. But we may write any such S as the union TO u . . . u T, of disjoint subtrees no pair of which is connected by an edge in 7’,Jx), and we have

~ss~-~i FTt FT~ cS cicTn

>min-=min@T,. iC, I

Hence it is clear that Q(G) 2 min Qs, where the minimisation is now over all subtrees S of T,Jx) with root(S) #root(V), as claimed.

* This is actually also an immediate corollary of the fact that AW(X) is a free process, i.e., the edges corresponding to non-zero transition probabilities form a tree.

Page 23: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 115

A lower bound on Gs for such subtrees is readily obtained. We may assume without loss of generality that S is maximal. Then the flow out of S is just Fs =f(cut(S))/20, where cut(S) is the cut edge connecting S to the rest of the tree. But sincef(cut(S)) approximates the number of leaves L(S) in S within ratio r, the flow is bounded below by

&. >m S’2rD’

(13)

On the other hand, summing edge weights in the subtree S as in the proof of Lemma 4.1, we may easily derive the bound

“Fs d(u) =.f(cut(S)) + 2 1 f(e) G 2rmWh (14) eE E(S)

where E( 5’) is the set of edges in S. Since Cs = C,, s d(u)/D, putting (13) and (14) together yields

@,3> 1 CS’4rZm’

which completes the proof of the lemma. 1

Since both m and r are at most polynomial in the problem size 1x1, the bound in Lemma 4.2 is sufficient to ensure that the chains .,&Z(x) are rapidly mixing. More precisely, for each x E C* with R(x) # 121, let d’“‘(t) denote the r.p.d. of J@%(X) over the whole state space V after t steps. Then we have:

LEMMA 4.3. There exists a function q: C* x R+ + N such that q(x, E) is polynomially bounded in 1x1 and lg EC’, and for each XE Z* with R(x) # a,

d’“‘(t)6~/2 for all t>q(x,E).

Proof: For each such x, the chain &V(x) satisfies the conditions of Theorem 3.4. Furthermore, we have seen that min,, y rry = l/D, which by (12) is bounded below by (2rmlZI”))‘. (Note that solutions are strings of length m over the alphabet C, so #R(x) < ICI”.) Applying Theorem 3.4, and using the bound on Q(G) obtained in Lemma 4.2, we get

d’“)(t) < 2rmlCI” (1 - (32r4m2)-I)‘.

The function q defined by

q(x, s) = 32r4m2(ln(2rm) + m 1nlZI + ln(2/.s))

then clearly satisfies the requirements of the lemma. 1

Page 24: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

116 SINCLAIR AND JERRUM

We are now in a position to state the first major result of this section.

THEOREM 4.4. Let R GZ* x C* be self-reducible. If there exists a polynomially time-bounded (deterministic) approximate counter for R within ratio 1 + O(n’) for some tl E R, then there exists a fully polynomial almost uniform generator ,for R.

Proof. Let V be the approximate counter for R as specified above. We proceed to construct an almost uniform generator 9 for R which uses % as an oracle.

On inputs (x, E) E C* x R +, 9 initially calls % with input x and halts with output ? if %(x)=0, which is the case if and only if R(x)= 0. Otherwise, 9 simulates the Markov chain J%?(x) defined above, starting at the root of T,(x). From their definition in (lo), the transition probabilities from any state can be computed by appropriate calls to V since we may easily keep track of the problem instance labels of the ver- tices. (Note that we are also inferring the structure of the tree locally in the process.) The simulation halts after q(x, E) steps, where q is the function specified in Lemma 4.3, outputting the corresponding solution if the final state is a leaf and ? otherwise. Since the degree of the tree is bounded by a polynomial in 1x1 and all problem instance labels have size at most 1x1, each step can be simulated in polynomial time. Together with the bound on q from Lemma 4.3, this ensures that Y always halts in time bounded by a polynomial in 1x1 and lg E ‘.

Now let 9(x, E) be the output random variable of Y on inputs (x, a). Clearly $9 only ever outputs valid solutions, so Pr(Y(x, E) = y) = 0 if y# R(x). Moreover, since the chain has been allowed to evolve for sufficiently many steps, we may deduce from Lemma 4.3 that

for any solution y E R(x), where D depends only on x and is defined at (11). Assuming as we may that E < 1, this ensures that the bias is within the required bound E. Finally, if R(x) # 0 Lemma 4.1 implies that

Pr(g(x, a) = ?) d (1 - 1/2rm)( 1 + s/2).

Assuming further that E < l/rm, this bound can be reduced to l/e < l/2 using only (2rm)2 iterations of the procedure 9. 1

If the approximate counter in Theorem 4.4 is randomised as defined in Section 2, so that it may occasionally produce arbitrarily bad results, the reduction still goes through but at the cost of some tiresome technicalities. We summarise the proof in this case.

Page 25: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 117

THEOREM 4.5. The result of Theorem 4.4 still holds even if the approximate counter for R is randomised.

Proof (sketch). Let x be a problem instance for which R(x) # Qr. As before, assume p is monotonic and set m = I,(x), r = p( 1x1). We begin by considering the intermediate case where the counter 9? is randomised but always produces estimates which are within ratio r of their correct values. We again define a Markov chain J%?(x) on the tree TR(x), whose tran- sition probabilities are determined as follows, Suppose the process is currently at vertex v, and let U be the set of children of v. For each UE Uu (v}, make a call %?(inst(u)) to the counter and denote the result c(u); then make a further inde~ndent set of calls ~(inst(~)) for the same vertices u and denote their sum d(v). Finally, make a transition to an adjacent vertex u with probability

c(u)/4r2d(~), if u is a chiid of u;

c(v)Pr*d(u), if u is the parent of v, (15)

and remain at u otherwise. (Note that the factor 1/4r* ensures that these transitions are always well defined, and that there is a self-loop probability of at least 4 in each state; we have used a rather than 4 for consistency with the second part of the proof.) Clearly, if %? is deterministic this reduces (except for a uniform factor of 1/2r*) to the original chain. In the ran- domised case, it is easy to see that the transition probability puu from v to u is actually the expectation

where the random variablef(u, v) is defined as in (8) and is independent of d(v). The stationary distribution R’ therefore satisfies

rr, cc l/E(d(v) -‘) \JVE v,

and the fact that %? is exact for atoms implies that d(v) = 1 with probability 1 for leaves v. The chain is clearly still time-reversible, and the rest of the proof goes through essentially as in the deterministic case, with d(u) and f(u, u) replaced by l/E(d(v)-‘) and E(f(u, Y)) respectively.

Now suppose that the counter may in addition produce arbitrarily bad results with some small probability 6: by Lemma 2.1 we may assume that 6 < 2-P(IXi) for all problem instances in the tree, where p is any desired polynomial. Since we are no longer able to infer the structure of TR(x) with certainty, we must now work in the larger self-reducibility tree TT,(x) (cf. Section 2). We let V, P denote the vertex sets of TR(x) and T,(x) respec-

Page 26: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

118 SINCLAIR AND JERRUM

tively. Note that p\V consists of a union of disjoint maximal subtrees of p,(x). Some modifications to the transition probabilities are also necessary. At vertex u E v, we compute values c(u), for u E U u (u}, and d(v) as before, where now U is the set of children of u in ?,Jx). If d(u) = 0 then we make a transition to the parent of u (if it exists) with probability +, and remain at u with probability a. Otherwise, we test whether C,c(z4)>4r2d(u): f i so, we remain at o; if not, we make a transition to a neighbouring vertex with probabilities as in (15). (Note that the self-loop probability in each state is at least t.) Once again, the leaves of T,(x) are treated as a special case.

This chain is clearly ergodic on some subset of v containing V, namely those states which communicate with the root. Henceforth we redefine v to include only such states. The chain is also still time-reversible because it is a tree process. Let us first observe that the new vertices in P\V have negligible effect. All transitions from V to v\V occur with at most tiny probability 6, so if started in V the process is unlikely to leave V during the course of the simulation. Should it enter a subtree in v\V, however, the random variable d(u) at the root vertex u will take the value 0 with probability very close to 1, thus causing the chain to leave the subtree rapidly. In fact, it is not hard to see that thestationary probability rc, of a vertex u E 8\ V is at most O(@), where k is the distance of u from V in FR(x). As a result, the total weight of r\V in the stationary distribution is small. Furthermore, the large exit probability from subtrees SG v\V ensures a lower bound on Qs similar to that in the proof of Lemma 4.2.

Examination of the transition probabilities within V reveals that we can view this portion as a chain of the restricted kind described in the first part of the proof whose transition probabilities have been perturbed by a factor in the range (1 f 6’), where 6’ depends on 6 and can be made exponentially small, It is then easy to see that the stationary probabilities of states in I’ undergo similarly small perturbations in the range (1 f d’),. As a result, a lower bound as in the proof of Lemma 4.2 also holds for subtrees S with root(S) E V, and so for all subtrees, which again implies that the conduc- tance Q(G) is suitably bounded below. Assuming that the simulation starts at the root, we therefore get rapid convergence over the subset V of the state space,3 which is sufficient since V includes all leaves of TR(x). A test applied to leaf labels ensures that no non-solutions are output. i

Remark. There is actually a simpler way to prove Theorem 4.5, though the resulting algorithm is less natural and the process is no longer strictly a Markov chain. Note that the simulation of Theorem 4.4 can still be

3 More precisely, we are using the r.p.d. A .(I) over V here, as defined in Section 3. Theorem 3.4 implies a sufficient condition for rapid mixing with respect to this measure also.

Page 27: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 119

performed using a randomised counter if we arrange to remember the outputs of the counter on all previous calls so that each edge weight is computed at most once. Provided all values returned by the counter are accurate within the given ratio, we are effectively in the situation of Theorem 4.4 and our earlier analysis applies. By powering the counter, we can ensure that this condition fails to hold with very small probability, so the effect on the overall process will be negligible.

Theorem 4.5 has an interesting consequence for counting problems. First let us generalise our notion of approximate counting to allow the error ratio in the estimate to be specified as part of the input. If R is a relation over C, then a randomised approximation scheme for #R is a probabilistic algorithm V whose output on input (x, E) E Z* x R+ is a non-negative real-valued random variable 3(.x, E) satisfying

Pr(%(x, E) approximates #R(x) within ratio 1 + E) > t.

59 is a fzdly polynomial randomised approximation scheme (fpras) if it runs in time bounded by a polynomial in 1x1 and E- ’ for all inputs (x, E). Note that the definition of fully polynomial here differs from that for almost uniform generators in the absence of a logarithm.

Jerrum, Valiant, and Vazirani (1986) show how to construct a fpras for #R for a self-reducible relation R given a Ep. almost uniform generator for R. In view of Theorem 4.5, this means that we can bootstrap a very crude counter for R to one with arbitrarily good asymptotic behaviour as follows. Suppose there exists a polynomially time-bounded randomised approximate counter for R within ratio 1 + O(n’) for some real CI (which we may think of as large). Then by Theorem 4.5 there exists a Ep. almost uniform generator for R, and hence by the above result of Jerrum et al. a fpras for #R. (Recall that Jerrum et al. establish this only for c1< -k,, a small threshold value as defined in Section 2.) We have therefore proved our next result.

THEOREM 4.6. Let R E C* x Z’* be self-reducible. If there exists a polynomially time-bounded randomised approximate counter for R within ratio 1 + O(n’) for some CI E R, then there exists a fully polynomial ran- domised approximation scheme for #R.

The chief significance of Theorem 4.6 is that it establishes a notion of approximate counting which is robust with respect to polynomial time computation, at least for the large class of self-reducible relations: a randomised approximate counter within ratio I+ O(n’) can always be improved to one within ratio 1 + n pb for any desired real /I with at most a polynomial increase in runtime. Thus we are justified in classifying the

Page 28: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

120 SINCLAIR AND JERRUM

counting problem for a self-reducible relation R as tractable if there exists a polynomial time randomised algorithm which with high probability approximates #R(x) to within some factor of the form 1 + 0( 1x1”), with aER. We suggest that this notion will be useful in the future classification of hard counting problems, as studied, e.g., by Stockmeyer (1983) and Karp and Luby (1983).

Remarks. (a) The bootstrapping described in Theorem ,4.6 actually holds for a rather trivial reason if the relation R has a property which we might call self-embeddability. Informally, R is self-embeddable if there exists an efficiently computable function [ which takes a pair x,, x2 of problem instances and embeds them in an instance 5(x,, x,), whose size is at most linear in lx,1 and 1~~1 and whose solution set is in (1-1)-correspondence with the product set R(x,) x R(x,). An example is the relation which associates with a directed graph G its set of (directed) Hamiltonian paths: the required embedding function 5 takes a pair G,, G2 of graphs and adds a new vertex u, together with edges from u to all vertices of G, and from all vertices of G2 to u. To bootstrap a counter for a self-embeddable relation, given a problem instance x we apply the embedding construction to obtain an instance z with #R(z) = #R(x) p(‘.“) for some suitable polynomial p; we then use the counter to approximate #R(z) and take the p( Ixl)th root of the result, which yields an improved estimate of #R(x). Although many natural relations turn out to be self-embeddable, there seem to be a number of significant exceptions among self-reducible relations, including DNF- satisfiability and natural restricted versions of familiar relations, such as Hamiltonian paths in planar graphs. Moreover, the Markov chain reduc- tion technique presented here can sometimes be applied even in the absence of self-reducibility. Evidence for this is provided by the relation GRAPHS discussed in the next section, which is apparently neither self-embeddable nor self-reducible under the degree restrictions imposed there.

(b) Theorem 4.5 can also be used to derive a surprising bootstrap- ping result for generators. Specifically, given a polynomially time-bounded generator for a self-reducible relation R which is almost uniform with bias U( 1x1 PkR), where k, is a constant as above, it is possible to construct a f.p. almost uniform generator for R (Sinclair, 1988). The new generator of course achieves exponentially small bias in polynomial time.

5. GRAPHS WITH SPECIFIED DEGREES

Given a sequence g = (g,, . . . . g, _ ,) of non-negative integers, is it possible to efficiently generate labelled graphs with vertex set (0, 1, . . . . n - 1 } in which vertex i has degree gi, 0 < i d n - 1, such that each graph occurs with

Page 29: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 121

roughly equal probability? We conclude this paper by showing how the approach of the previous section can be used to answer this question affirmatively, provided that the maximum degree does not grow too rapidly with the number of vertices n.

Our motivation for looking at this problem is twofold. First, there is its inherent interest as indicated below. Second, and perhaps more important, it serves to illustrate how our ideas may be used to exploit a much wider class of asymptotic counting results than has hitherto been possible, even for structures which are not self-reducible. We suggest that other natural structures can be handled similarly.

The special case of the problem in which the graphs are regular, i.e., gi = k for all i and some k, is of particular interest and has been considered by several authors. Regular graphs are a natural class to study in their own right and have recently become an important model in the theory of random graphs (Bollobis, 1985). A generation procedure would provide a means of examining “typical” regular graphs with a given number of vertices and given degree and investigating their properties, about many of which little is known. Furthermore, it has recently been shown by Wormald (1987) that generation techniques for labelled graphs with a given degree sequence can be used in the uniform generation of isomor- phism classes of regular graphs.

Wormald (1984) gives efficient algorithms for uniformly generating labelled cubic and degree-4 graphs on n vertices. However, these are based on specific recurrence relations and do not generalise easily to higher degrees. A simpler method discussed in (Wormald, 1984), and already implicit in the work of Bollob& (1980), uniformly generates labelled regular graphs of arbitrary degree k, but the probability of failure remains polynomially bounded only if k = @(log n)1’2). When the degree is per- mitted to increase more rapidly with n, it seems difficult to generate the graphs with anything approaching equal probabilities: in the approach of Tinhofer (1979), for example, the probabilities associated with different graphs may vary widely. Our method, which relies on the reduction to counting developed in the previous section, requires only that k = O(~Z’/~) and achieves a distribution over the graphs which is asymptotically very close to uniform.

In keeping with our general approach, we begin by defining a relation which describes the graphs of interest. For the sake of clarity, we shall not refer in this section to an encoding scheme: it should however be clear how to translate everything into the formal framework of Section 2. A (la&/led) degree sequence on vertex set [n] = (0, . . . . n - 1 } is a sequence g = h ...> g, ~ ,) of non-negative integers such that xi gi = 2e(g) is even, and a graph on g is a graph with vertex set [n] in which vertex i has degree gi, 0 < id n - 1. (All graphs here are assumed to be simple and undirected.)

Page 30: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

122 SINCLAIR AND JERRUM

If the vertex set is understood, we shall identify a graph with its edge set. It is actually convenient to generalise the above problem by allowing a set of forbidden edges to be specified. Accordingly, we define the relation GRAPHS which associates with each problem instance of the form (g, X), where g is a degree sequence on [n] and X is a labelled graph with vertex set [n], the solution set

GRAPHs(g, X) = {G: G is a graph on g having no edge in common with Xl.

We refer to X as an excluded graph for g. Although this relation is self- reducible as it stands, we get a more symmetrical structure using the relation R defined by

R(g, X) = { (G, cu ): G E GRAPHS&, X) and w is an edge-ordering of G}.

Clearly, we can move freely between these relations since any solution set R(g, X) contains precisely e(g) ! ordered copies of each element of GRAPHS(g, x).

Next we specify a self-reducibility on R by defining the tree of derivations T,(g, X), assuming that R(g, X) # 0. In this tree, the object (G, o) will be derived by successively adding the edges of G in the order determined by w. More precisely, the partial solution labels of the tree are in (l-l)- correspondence with pairs (H, w), in which Z7 is a graph with vertex set [n] which can be extended to at least one graph in ciRAPr-rs(g, X), and o is an edge-ordering of fl. The root has label (0, a), while the children of the vertex with label (R, o) have labels of the form (Ru {(i, j)}, w + (i, j)) for some edge (i, j), where o + (i, j) denotes the extension of w in which (i, j) is the largest element. The problem instance label of a vertex u is determined by its partial solution label (& w) as follows. Let E = (I$, . . . . En- i) be the degree sequence of R, and define h = g - h, where the subtraction is pointwise. Also, let Y be the subgraph of Xv Z7 obtained by deleting all edges (i, j) for which either fii =gi or Ej=gj. Then the problem instance label of u is (h, Y).

Note that the deletion of redundant constraints from Xu R is not necessary for the consistency of the tree, but it will prove useful later-in the proof of Lemma 5.3-that Y represents only the essential excluded graph. From now on, we will in fact assume that all problem instances (g, X> have had redundant constraints removed. In particular, this means that the problem instance label of the root of the tree is just (g, X). It also justifies our use of e(g) as a measure of input size for this problem when stating approximation results below.

Now that we have a tree of derivations for R, Theorem 4.4 will give us an efficient almost uniform generator for R, and hence for GRAPHS, provided we can count these structures with sufficient accuracy. The

Page 31: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 123

counting problem for GRAPHS has received much attention over a number of years, where the aim has chiefly been to extend the validity of asymptotic estimates to a wider range of degrees (see McKay, 1985 for a brief survey). The best result available to date is due to McKay, and we quote this below.

Given a degree sequence g on [In] and an excluded graph X for g, let x=(x0, . . . . x,-r) be the degree sequence of X, and define yk, X) = max{ sL,,, gmaxxmax >, where g,,, = maXi gi and x,,, = maXi xi. We shall use y to express bounds on the degrees involved in the problem. Furthermore, if g,,, > 0 set

a!) = & 1;: gi(gi- l); ,a, m = - 2eig) ,.zxgigj. I,

THEOREM 5.1 (McKay, 1985). There exists Q positive constant r0 with the property that, for any problem instance (g, X) with g,,, > 0 and y(g, X) < e(g)/lO, the quantity

CMg))! e(g)! 2e(g) nl:d g,! exp( - W - 4g)* - Ag9 X)1

approximates #GRAPHS(g, X) within ratio exp(r,y(g, X)2/e(g)).

Remarks. (a) Actually, McKay’s result is slightly stronger than this: we have stated it in a simplified form which is adequate for our purposes.

(b) The estimate in Theorem 5.1 immediately leads to a simple method, suggested by Wormald (1984) and implicit in the earlier work of Bollobas (1980) for generating graphs whose degrees grow slowly with the number of edges: make gi copies of vertex i for each i, generate a pairing (i.e., a perfect matching in the complete graph on these vertices) uniformly at random, and then collapse the copies to a single vertex again. The result will be a multigraph on g, and the distribution over caAPHs(g, X) is uniform, but the procedure may fail since not all the graphs generated in this way will be simple or avoid X. The exponential factor in (16) can be interpreted as approximating the probability that a randomly chosen pairing yields an element of GRAPHS(g, X). It is then clear from the definitions of 1 and p that, provided y(g, X) = O(log e(g)), this probability is polynomially bounded below, so that the method is effective in this range. For regular graphs, this implies a degree bound of 0( (log n)“‘).

Let us now restate Theorem 5.1 in a more convenient form.

COROLLARY 5.2. Let Q, B be fixed real numbers with Q > 0 and B 2 100Q4. Then for all problem instances (g, X) for which either e(g) < B

Page 32: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

124 SINCLAIR AND JERRUM

or Yk, J3 6 Q*e(g)“*, the quantity #R(g, X) can be approximated in polynomial time within a constant ratio.

Proof We have already observed that #R(g, X) = e(g)! #GRAPHs(g, X), SO we need only approximate the latter. Note that when e(g) > B the bound on y ensures also that y(g, X) < e(g)/lO, so we may appeal to Theorem 5.1. The expression in (16) can clearly be evaluated in polynomial time and yields an approximation within the constant ratio exp(r,,Q4) in all relevant cases, except when g,,, = 0 or possibly when e(g) < B. The first case is trivial; to handle the second, observe that for fixed B there are only a constant number of instances, up to relabelling of the vertices, for which e(g) d B, so all counting in this range may be done exactly by explicit enumeration. (Alternatively, in practice any convenient approximation method may be used, subject to the proviso that it yields the answer 0 iff # GRAPHS(g, X) = 0: this property can be tested in polynomial time using matching techniques.) 1

Now let us see whether Corollary 5.2 is powerful enough to allow us to construct a generation algorithm for GRAPHS via the reduction to counting embodied in Theorem 4.4. Ideally, we might hope to handle instances for which y(g, X) grows as Q(e(g)“*). However, this does not follow immediately since the relation R is no longer self-reducible when restricted in this way. In other words, even if g,,, and x,,, are suitably bounded, the tree T,(g, X) will in general contain vertices whose problem instances (h, Y) are unbalanced in the sense that the degrees are rather large com- pared to the number of edges e(h), so that we cannot guarantee reasonable counting estimates over the whole tree. We will overcome this problem by naively pruning the tree in such a way as to leave only problem instances which do fall within the bounds of Corollary 5.2, though we will have to do a little work to check that the effects of this are not too drastic.

For any pair Q, B of real numbers with Q > 0 and B b 100Q4, we call a problem instance (g, X) (Q, B)-balanced if either e(g) < B or y(g, X) < Q?e(g)“*. If (g, X) is (Q, B)-balanced and R(g, X) # 0, then the pruned tree Tkp.B)(g, X) with respect to Q, B is obtained by deleting from T,(g, X) each vertex whose problem instance label is not (Q, B)-balanced, together with the entire subtree rooted at the vertex.

Now consider defining a time-reversible Markov chain &%‘(g, X) on the pruned tree in precisely the same manner as in Section 4, using the counting estimates of Corollary 5.2. Our first claim is that the conductance bound of Lemma 4.2 still holds, so that .&‘$?(g, X) is rapidly mixing. To see this, imagine a corresponding chain on the complete tree T,(g, X) in which all counting estimates are within the constant ratio of Corollary 5.2: clearly, in this case the conductance is bounded as in Lemma 4.2. But JP&?(g, X) is obtained from this chain simply by deieting some subtrees

Page 33: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 125

and, as the reader may readily verify, the removal of extremal portions of a Markov chain cannot decrease its conductance. Hence the bound of Lemma 4.2 applies to .NU(g, X) also.

We turn now to the effect of the pruning operation on the stationary dis- tribution. As before, the distribution will be proportional to the “degrees” d(u) defined as in (9) and can be made uniform over leaves by counting exactly at this level. (When we speak of “leaves” of the pruned tree, we shall always mean those vertices which are also leaves of the original tree T,(g, X).) However, since we have lost some leaves by pruning, it is by no

means obvious that the induced distribution on GRAPHs(g, X) obtained by forgetting the edge orderings is even close to uniform, or that the failure probability is still bounded. Both these facts will follow from the lemma below, which says that in the pruning process we lose at most a small frac- tion of the leaves corresponding to any graph in GRAPHs(g, X) provided that the constants Q, B are suitably chosen.

LEMMA 5.3. Let 9 be a family of problem instances (g, X> satisfying max 1 g,,, t xmax > = O(e(g)“4) and /3 a real constant. Then there exists a pair of real numbers Q, B as above (which depend on 9 and /I) such that, for each instance (g, X) E 9 with GRAPHS(g, X) # 0, and each GE GRAPHS(g, X), the pruned tree Tkp,B’(g, X) contains at least e(g)!( 1 - e(g)-8/4) leaues with solution label G.

We postpone the rather technical proof of this lemma until we have examined its consequences, which constitute the central results of this section.

THEOREM 5.4. For any fixed real b’, there exists a polynomial time algorithm which generates elements of GRAPHS(g, X) almost untformly with bias at most e(g))B, provided that the degrees involved are bounded as maxk,,,, x ,,,I = W4g)“4).

Proof We assume without loss of generality that fl >/O and that e(g) > 0. Let Q, B be real numbers satisfying the conditions of Lemma 5.3 for the given value of /?. Assuming that GRAPHS(g, X) # 0, simulate the Markov chain &%?(g, X) as defined above. By the discussion preceding Lemma 5.3, the chain is rapidly mixing so a polynomially bounded simulation suffices to ensure a r.p.d. of at most e(g)pP/4. But by Lemma 5.3, the stationary distribution of the chain induces a distribution over GRAPHs(g, X) which is almost uniform with bias at most e(g))8/4, since e(g) 2 1 and fl> 0. The overall bias is then at most e(g))“, as required. Finally, again by Lemma 5.3, the stationary probability of being at a leaf is bounded below as in Lemma 4.1 except for an additional factor due to pruning of (1 - e(g))B/4) 2 t. 1

Page 34: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

126 SINCLAIR AND JERRUM

COROLLARY 5.5. For any fixed real /3, there exists a polynomial time algorithm which generates labelled k-regular graphs on n vertices almost uniformly with bias at most nB, provided that the degree is bounded as k = O(n113).

We could of course allow the bias in the above algorithm to be specified as part of the input. However, there is no reason to suppose that the resulting generator would be fully polynomial since we can say nothing useful about the behaviour of the counter in Corollary 5.2 for “small” instances as Q and B vary. Thus the polynomial bias claimed here is apparently the best we can achieve in polynomial time. Note that the source of the bias is essentially just the pruning operation on the tree: the effect of the truncation of the Markov chain is exponentially small as in Theorem 4.4, and thus negligible by comparison.

It remains now for us to prove Lemma 5.3. For this we require a preliminary technical result.

PROPOSITION 5.6. Let Z be a random variable denoting the number of green objects in a random sample (without replacement) of size s > 0 from a population of size m 2 2s made up of g green and b = m -g blue objects, and let p = E(Z) = sgfm. Then for any real a > 0,

2e ap Pr(Z>ap)<s -

0 a

Proof. Note that Z is distributed hypergeometrically with mean E(Z) = p as claimed. Now set r = au. If r < sg/(m -s) then the right-hand side of the above inequality is greater than 1 and there is nothing to prove. Assume therefore that r > sg/(m - s). For each i, 1 < i< s, the probability that the ith choice yields a green object, conditional on the preceding choices, certainly cannot exceed g/(m -s), since there are always at least m-s elements remaining in the pool. Thus for any r’ EN with r’> r we have

But we have also r’ S

0 s! r’

r' &< 2

=r'!(s-r')! r'! r’ ’ 0

by Stirling’s approximation, so that

Page 35: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 127

Now the function f(x) = (c/x)-‘, with c E R+, is monotonically decreasing for c/x < e; hence, since I’ > r > sg/(m -s), we have the bound

and consequently

2e ‘lc Pr(Z>r)6 2 Pr(Z=r’)<s - ,

c=rrl 0 c1

as required. 1

Proof of Lemma 5.3. By virtue of the asymptotic bounds on g,,, and we may choose Q > 0 such that max{ g,,,, x,,,} < (Q/4) e(g)‘/” for

zl?%stances in the family. This implies a lower bound on B of 100Q4: further constraints on B will be introduced below. Note that all instances in the family are certainly (Q, B)-balanced.

For problem instances with e(g) < B there is nothing to prove as no pruning takes place in the tree T,(g, X). So let (g, X) be an instance in the family with m = e(g) > B, and G be any graph in GRAPr-rs(g, X), assumed non-empty. In order to estimate the proportion of all m ! derivations of G present in the pruned tree T p*“)(g, X), we estimate the probability that a randomly chosen derivation of G is present. More precisely, consider the random process (8(1))yz 0, where R(O) = @ and, for t > 1, t7(‘) is a sub- graph of G having precisely t edges which is obtained from R(‘-‘) by adding a single edge, all unused edges of G being equiprobable. If we iden- tify B(‘) with a problem instance label (h (‘) Y(‘)) in the tree of derivations , as before, then a random derivation (a(“) is still present after pruning iff (h”‘, Y(‘)) is (Q, B)-balanced for 0 d t 6 m. The proportion of all m ! derivations of G which are present after pruning is therefore just

(I), Y(l)) is (Q, B)-balanced) (17)

We proceed to obtain a lower bound on (17) by showing that, for each t separately, (h (1) Y(l)) is almost surely (Q, B)-balanced, provided we make , B large enough. Clearly, this is just the event that the problem instance corresponding to a randomly chosen t-edge subgraph of G is (Q, B)- balanced. The proof divides into four stages, corresponding to various ranges of values of t.

(i) If O<t <m/2, then Pr((h (‘) Ycr)) is (Q, B)-balanced) = 1. We , always have hk& <g,,, and y& < x,,, +g,,,, so that y(h(‘), Ycf)) < 2y(g, X). Furthermore, for all t in this range, e(h”)) 2 e(g)/2. From our

Page 36: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

128 SINCLAIR AND JERRUM

initial choice of Q, we conclude that (h (I’ Yet)) is (Q, B)-balanced for all , such t.

(ii) If m/2 < t d m - m5”, then Pr( (h(l), Y”‘) is (Q, B)-balanced) 2 1 - mpB-‘/4. Recall that i!!(‘) can be viewed as a randomly chosen t-edge subgraph of G, or equivalently, its complement H(‘) in G as a randomly chosen s-edge subgraph of G, where s = m - t. Now we have

y(h”‘, Ycr)) = max{hL& y,$&, /$,gX2}

G ham&,,, + g,,,) d h”) Qe(g)“4. max

Hence (h”’ Y”)) Qe(h(1))1/2 e(gi - l/4

will certainly be (Q, B)-balanced if h$, < , i.e., if the maximum vertex degree h!$g, of the random

s-edge subgraph H (‘) does not exceed Qs”‘m -l/4 We can estimate the . probability of this event using Proposition 5.6 as follows: let i E [n] be any vertex with gj > 0. Then if we colour green all gj edges of G adjacent to j, and all other edges of G blue, the random variable hj”) is distributed as the number of green edges in a random sample (without replacement) of s edges of G. We are therefore in the situation of Proposition 5.6, with Z = hj’), g = gj, and tail value clp = Qs”2m-‘/4, where p = sgjfm is the mean of hj’). The factor c( is quite large, viz.,

a-Qm”42viZQm”4>4\/i

s’12g, max ’ ’ g

where we have used the facts that s < m/2 and g,,, d (Q/4)m”4. The tail value itself satisfies

QSli2 Cr/.L= - > Qm I/16,

m’/4

since also s B m5j8. Proposition 5.6 therefore yields

Pr(h!” > CI~) < s I

where c = (2 fi/e)Q > 1. Thus the probability that any vertex degree h/(‘) exceeds the bound is at most m2cpm”“, which is less than m-P-‘/4 for all m > B provided B is chosen large enough.

(iii) If m-m ‘I8 < t < m - B, then Pr( (h(‘), YCt)) is (Q, B)-balanced) > 1 - mpB-‘/4. As in (ii) above, let s = m - t and view H(‘) as a randomly chosen s-edge subgraph of G. In view of (i), we may assume that s<m/2. By definition of y, (h , (I) Y(l)) will be (Q, B)-balanced if hga, and yg’,, are each bounded above by Qe(h(‘))‘/4. In the case of hgh, we proceed via

Page 37: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 129

Proposition 5.6 precisely as in (ii), only this time with tail value a~ = Qs”“. We find that

since now s<m . ‘I* Further, a~ = Q.r ‘I4 3 QB114, so we get the tail estimate

Pr(h!‘) > LX~) < s J

Thus the probability that any vertex degree A,!‘) exceeds Qs”~ is at most m2(cm)-8’, where c > 0 is fixed and /?’ can be made arbitrarily large by suitable choice of B. By setting B appropriately, we can clearly make this less than rne8-l/8 for all m > B.

A similar argument can be used to handle y!& : for a vertex i E [n] with gj > 0, let r(j) be the set of vertices adjacent to j in G. At this point we make use of the fact (refer to the definition of problem instance labels in the tree) that Y(‘) includes only essentid excluded edges, i.e., edges (i, k) for which both hj’) > 0 and hp) > 0. From this it is clear that

yj’)< I{iEr(j):hj’)>O}l. (18)

Now colour green all edges of G with an endpoint in r(i), and the remain- der blue, and again view H(‘) as a random sample of size s from the edge set of G. Each time a green edge is selected, it contributes at most two to the right-hand side of (18). Thus yj’) < 22, where the random variable 2 is the number of green edges in the sample, so the required tail probability may be estimated from Proposition 5.6 with g = xi, r(jI gi < g:,,, and CL~ = Qs”~/~. The bounds on s in this range imply

cl> Qm > S m'132, 2s3'4d,ax Q

and crp 3 QB”“/2, so that

Pr( y!‘) > 2ap) < Pr(Z > CI,U) < s J

Exactly as above, this ensures that the probability that any vertex degree y!‘) exceeds Qs114 is at most m -p-‘/8 for all m > B, provided we make B lirge enough. Combining the bounds for I$!,& and y!&, we arrive at (iii).

(iv) Iftarn-B, then Pr((h”‘, Y(‘) ) is (Q, B)-balanced) = 1. This is true by definition, since e(h”‘) = m - t < B.

Page 38: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

130 SINCLAIR AND JERRUM

In view of (i)-(iv), the probability of the conjunction in (17) is now easily seen to be bounded below by 1 -me8/4, as claimed in the lemma. 1

We conclude our discussion of graphs with specified degrees with some remarks on the counting problem. The reduction in (Jerrum et al., 1986) from approximate counting to almost uniform generation mentioned at the end of Section 4 may be viewed more generally as a means of approximating the number of leaves in a rooted tree T given an almost uniform generator for the leaves in the maximal subtree rooted at any ver- tex. For any such subtree S, let L(S) denote the number of leaves in S. The idea is to generate leaves of T almost uniformly and compute the fraction s of the sample which belong to the subtree S rooted at some suitably chosen child of root(T): this will be a reliable estimate of the true fraction if the latter is not too small and the sample is large enough. An estimate of L(T) is then obtained by recursively estimating L(S) and multiplying the result by SK’. The aggregate of the sample sizes required to achieve an approximation of L(T) within ratio 1 + E with high probability is bounded by a polynomial function of c-’ and the depth m and maximum degree of T, assuming the generators have bias at most about E/m.

Now consider the situation of Theorem 5.4: can we apply the above technique to estimate the number of leaves in the pruned tree TkQ,“)(g, X)? Note first that an almost uniform generator for the leaves in any maximal subtree S is available since we may simulate just this portion of the Markov chain .@Z(g, X), transitions out of S being censored. Moreover, the reduced chain clearly inherits the rapid mixing property, so the generator will be efficient provided only that the subtree has sufficiently many leaves. It is not hard to see that, by modifying slightly the method in (Jerrum et al., 1986, Theorem 6.4) for selecting a subtree for the recursion, we can ensure that this condition always holds with high probability.

Choosing E = mP8/4 for some PER, we therefore get a randomised approximate counter which estimates the number of leaves in Tkpg”)(g, X) within ratio 1 + m -“/4 in polynomial time. But by Lemma 5.3 this number itself approximates m! #GRAPHS(g, x) within ratio 1 + m-‘/2, so we are in fact able to approximate #GRAPHS(~, X) within ratio 1 + m --’ in polynomial time for any desired p E R. We summarise this discussion in our final theorem.

THEOREM 5.7. For any fixed real & there exists a polynomially time- bounded randomised approximate counter for GRAPHS within ratio 1 + e(g) -B, provided that the degrees are bounded as max( g,,, , x,,, } = O(4gP4).

Page 39: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 131

Theorem 5.7 implies the existence of a polynomial time algorithmic method for computing the number of labelled graphs with specified degrees (assuming that these are not too large) with a relative error which is smaller than any desired power of the number of edges. The asymptotic behaviour of such a counter thus compares very favourably with available analytic estimates, such as Theorem 5.1. While this is a remarkable theoretical result, we suspect that the various powers and constants accumulated in the reductions will render the method impractical if a high degree of accuracy is required.

Finally, we should observe that the counting problem for GRAPHS is apparently hard to solve exacfly even under the degree restrictions imposed in this section, so that the approximation approach pursued here is justified. More precisely, we can say that the problem of evaluating #GRAPHS for instances (g, X) whose degrees are bounded as max 1 gmax y ha, } = O(e(g)li4) is #P- complete. To see this, note first that there is a simple reduction from the well-known #P-complete problem of counting perfect matchings in a graph G, under which the excluded graph X is the complement of G and the degree sequence is (1, 1, . . . . 1). The #P-completeness of the restricted version follows from the fact that the former problem remains #P-complete even for very dense graphs G, specifically when G has minimum vertex degree n - O(n’14), as can be shown using a reduction of Broder (1986).

ACKNOWLEDGMENT

The concept of self-embeddability mentioned in Section 4 arose in discussions with Keith Edwards.

RECEIVED October 13, 1987; ACCEPTED May 24, 1988

REFERENCES

ALDOUS, D. (1981), Random walks on finite groups and rapidly mixing Markov chains, in “Seminaire de Probabilitts XVII,” Lecture Notes in Mathematics Vol. 986, pp. 243-297, Springer-Verlag, New York/Berlin.

ALDOUS, D. (1987), On the Markov chain simulation method for uniform combinatorial distributions and simulated annealing, Prob. Engrg. Inform. Sci. 1, 33-46.

ALDOUS, D., AND DIACONIS, P. (1986) Shutlhng cards and stopping times, Amer. Marh. Monthly 93, 333-348.

ALDOUS, D., AND DIACONIS. P. (1987) Strong uniform times and finite random walks, Adu. in Appl. Malh. 8, 69-97.

ALON, N. (1986), Eigenvalues and expanders, Combinatorics 6, 83-96. Au)N, N.. AND MILMAN, V. D. (198.5), I,, isoperimetric inequalities for graphs and super-

concentrators, J. Combin. Theorv Ser. B 38, 73-88.

Page 40: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

132 SINCLAIR AND JERRUM

BACH. E. (1983), How to generate random integers with known factorisation, in “Proceedings 15th ACM Symposium on Theory of Computing,” pp. 1844188.

BINDER, K. (1976), Monte Carlo investigations of phase transitions and critical phenomena, in “Phase Transitions and Critical Phenomena” Vol. 5b (C. Domb and M. S. Green, Eds.). pp. l-105, Academic Press, London.

BOLLOB~S, B. (1980), A probabilistic proof of an asymptotic formula for the number of labelled regular graphs, European J. Combin. 1, 311-316.

BOLLOB& B. (1985) “Random Graphs,” Academic Press, London, BRODER, A. Z. (1986), How hard is it to marry at random? (On the approximation of the per-

manent), in “Proceedings, 18th ACM Symposium on Theory of Computing,” pp. 5&58; see also Erratum in “Proceedings, 20th ACM Symposium on Theory of Computing, 1988,” p. 551.

CAI, J.-Y., AND HEMACHANDRA. L. A. (1986). “Exact Counting Is as Easy as Approximate Counting,” Technical Report TR 86-761, Department of Computer Science, Cornell University.

CHEEGER, J. (1970) A lower bound for the smallest eigenvalue of the Laplacian, in “Problems in Analysis” (R. C. Gunning, Ed.), pp. 195-199, Princeton University Press, New Jersey.

DODZIUK, J. (1984), Difference equations, isoperimetric inequality and transience of certain random walks, Trans. Amer. Math. Sot. 284, 787-794.

FELLER, W. (1968), “An Introduction to Probability Theory and Its Applications” Vol. I, 3rd ed., Wiley, New York.

GAREY, M. R., AND JOHNSON, D. S. (1979), ‘Computers and Intractability: A Guide to the Theory of NP-Completeness,” Freeman, San Francisco.

GILL, J. (1977), Computational complexity of probabilistic Turing machines, SIAM J. Comput. 6, 675-695.

G~~NOCHE, A. (1983), Random spanning tree, J. Algorithms 4, 214-220. JERRUM. M. R., AND SINCLAIR, A. J. (1988), Conductance and the rapid mixing property for

Markov chains: The approximation of the permanent resolved, in “Proceedings, 20th ACM Symposium on Theory of Computing,” pp. 235-244.

JERRUM, M. R., VALIANT, L. G., AND VAZIRANI, V. V. (1986), Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci. 43, 169-188.

KARP, R. M., AND LUBY, M. (1983), Monte-Carlo algorithms for enumeration and reliability problems, in “Proceedings, 24th IEEE Symposium on Foundations of Computer Science,” pp. 5664.

KEILSON, J. (1979) “Markov Chain Models-Rarity and Exponentiality,” Springer-Verlag, New York.

KIRKPATRICK, S., GELLATT, C. D., AND VECCHI, M. P. (1983), Optimisation by simulated annealing, Science 220, 671-680.

KOLMOGOROV, A. N., AND FOMIN, S. V. (1970), “Introductory Real Analysis,” Prentice-Hall, Englewood Cliffs, NJ.

LAWLER, G. F., AND SOKAL, A. D. (1988) Bounds on the L2 spectrum for Markov chains and Markov processes: A generalization of Cheeger’s inequality, Truns. Amer. Math. Sot. 309, 557-580.

MCKAY, B. D. (1985), Asymptotics for symmetric 0-l matrices with prescribed row sums, Ars Combin. 19A. 15-25.

NIJENHUIS, A.. AND WILF, H. S. (1978), ‘Combinatorial Algorithms,” 2nd ed., Academic Press, Orlando, FL.

SCHNORR, C. P. (1976), Optimal algorithms for self-reducible problems, in “Proceedings, 3rd International Colloquium on Automata, Languages and Programming,” pp. 322-337.

SENETA, E. (1981) “Non-negative Matrices and Markov Chains,” 2nd ed., Springer-Verlag. New York.

Page 41: Approximate Counting, Uniform Generation and Rapidly ...sinclair/approx.pdf · counting and discuss the implications of this result. Lastly, in Section 5 we apply the ideas of Section

APPROXIMATE COUNTING 133

SINCLAIR, A. J. (1988) “Randomised Algorithms for Counting and Generating Combinatorial Structures,” Ph.D. thesis, University of Edinburgh.

STOCKMEYER, L. (1977), The polynomial-time hierarchy, Theoret. Comput. Sci. 3, l-22. STOCKMEYER, L. (1983) The complexity of approximate counting, in “Proceedings, 15th ACM

Symposium on Theory of Computing,” pp. 118-126. TINHOFER, G. (1979), On the generation of random graphs with given properties and known

distribution, Appl. Comput. Sri., Ber. Prakr. Inf 13, 265-297. VALIANT, L. G. (1979a), The complexity of computing the permanent, Theoret. Compuf. Sci.

8, 189-201. VALIANT, L. G. (1979b). The complexity of enumeration and reliability problems, SIAM J.

Comput. 8, 41&421. WILF, H. S. (1981) The uniform selection of free trees, J. AIgorithms 2, 204-207. WORMALD. N. C. (1984), Generating random regular graphs, J. Algorifhms 5, 247-280. WORMALD, N. C. (1987) Generating random unlabelled graphs, SIAM J. Comput. 16,

717-727.

Primed in Belgium