the complexity of satisfiability problems thomas ......also contain some representatives of the many...

11
THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas J. Schaefer* Department of Mathematics University of California, Berkeley, California 94720 ABSTRACT The problem of deciding whether a given prop- ositional formula in conjunctive normal form is satisfiable has been widely studied. I t is known that, when restricted to formulas having only two literals per clause, this problem has an efficient (polynomial-time) solution. But the same problem on formulas having three literals per clause is NP-complete, and hence probably does not have any efficient solution. In this paper, we consider an infinite class of satisfiability problems which contains these two particular problems as special cases, and show that every member of this class is either polynomi- al-time decidable or NP-complete. The infinite collection of new NP-complete problems so obtained may prove very useful in finding other new NP-com- plete problems. The classification of the polyno- mial-time decidable cases yields new problems that are complete in polynomial time and in nondetermin- istic log space. We also consider an analogous class of prob- lems, involving quantified formulas, which has the property that every member is either polynomial- time decidable or complete in polynomial space. I. INTRODUCTION-- A GENERALIZED SATISFIABILITY PROBLEM We start with an introductory example. Let R(x,y,z) be a 3-place logical relation whose truth- table is {(l,O,O),(O,l,O),(O,O,l)} -- that is, R(x,y,z) is true iff exactly one of its three argu- ments is true. Consider the problem of deciding whether an arbitrary conjunction of clauses of the form R(x,y,z) is satisfiable. We call this the ONE-IN-THREE SATISFIABILITY problem. For example, the formula R(x,y,z)AR(x,y,u)AR(u,u,y) is satis- fiable, because it is made true by assigning the values O,l,O,O to the variables x,y,z,u respective- ly. As w i l l be seen, the ONE-IN-THREE SATISFIABIL- ITY problem is NP-complete. The similarity between this problem and the standard satisfiability problem for propositional formulas in conjunctive normal form leads to the generalization which is the subject of this paper. Consider the problem of deciding whether a given CNF formula with 3 literals in each clause is satisfiable --.a well-known NP-complete problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *Author's current address: CALMA, 527 Lakeside Drive, Sunnyvale, CA 94086 Since a clause may contain any number of negated variables from 0 to 3, there are four distinct relations among variables which occur as conjuncts in the formulas of this problem -- namely, the re- lations Ro,R I,R~,R 3 defined by Ro(x,y,z) ~ xvyvz, R1(x,y,z) ~ Ix~yvz, R2(x,y,z) ~ ~xv~yvz a6d R3(x,y,z) z ~xv~yv~z. An input to this satisfiabiality problem is just a conjunction of clauses of the form Ri(~,~',~") for various vari- ables ~,~',~" and various i ~{0,I,2,3}. This sets the stage for the following general- ization. Let S = {R l ..... R m} be any finite set of logical relations. (A logical relation is defined to be any subset of {O,l} k for some integer k~l. The integer k is called the rank of the relation.) Define an S-formula to be any conjunction of clauses, each of the form ~i(~l,~2 .... ), where ~l,~p .... are variables whose number matches the r~nk-ofRi,iE{l ..... m}, and R~ is a relation sym- bol representing the relation"'Ri. The S - s a t i s f i - a b i l i t y problem is the problem of deciding whether a given S-formula is satisfiable. We denote by SAT(S) the set of all satisfiable S-formulas. The main result of this paper characterizes the complexity of SAT(S) for every finite set S of logical relations. The most striking feature of this characterization is that for any such S, SAT(S) is either polynomial-time decidable or NP-complete. This dichotomy is somewhat surprising, since one might expect that any such large and di- verse class of problems, that includes both polyno- mial-time decidable and NP-complete members, would also contain some representatives of the many in- termediate degrees of complexity which presumably l i e between these two extremes. Furthermore, we give an interesting classifi- cation of the polynomial-time decidable cases. We show that (assuming P~NP) SAT(S) is polynomial-time decidable only if at least one of the following conditions holds: (a) Every relation in S is satisfied when all variables are O. (b) Every relation in S is satisfied when all variables are I. (c) Every relation in S is definable by a CNF formula in which each conjunct has at most one negated variable. (d) Every relation in S is definable by a CNF formula in which each conjunct has at most one unnegated variable. - 216 -

Upload: others

Post on 25-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

THE COMPLEXITY OF SATISFIABILITY PROBLEMS

Thomas J. Schaefer* Department of Mathematics

University of California, Berkeley, California 94720

ABSTRACT

The problem of deciding whether a given prop- ositional formula in conjunctive normal form is satisfiable has been widely studied. I t is known that, when restricted to formulas having only two l i te ra ls per clause, this problem has an eff ic ient (polynomial-time) solution. But the same problem on formulas having three l i te ra ls per clause is NP-complete, and hence probably does not have any eff ic ient solution.

In this paper, we consider an in f in i te class of sa t i s f i ab i l i t y problems which contains these two particular problems as special cases, and show that every member of this class is either polynomi- al-time decidable or NP-complete. The in f in i te collection of new NP-complete problems so obtained may prove very useful in finding other new NP-com- plete problems. The classification of the polyno- mial-time decidable cases yields new problems that are complete in polynomial time and in nondetermin- is t i c log space.

We also consider an analogous class of prob- lems, involving quantified formulas, which has the property that every member is either polynomial- time decidable or complete in polynomial space.

I . INTRODUCTION -- A GENERALIZED SATISFIABILITY PROBLEM

We start with an introductory example. Let R(x,y,z) be a 3-place logical relation whose truth- table is {(l,O,O),(O,l,O),(O,O,l)} -- that is, R(x,y,z) is true i f f exactly one of i ts three argu- ments is true. Consider the problem of deciding whether an arbitrary conjunction of clauses of the form R(x,y,z) is satisfiable. We call this the ONE-IN-THREE SATISFIABILITY problem. For example, the formula R(x,y,z)AR(x,y,u)AR(u,u,y) is satis- fiable, because i t is made true by assigning the values O,l,O,O to the variables x,y,z,u respective- ly. As wi l l be seen, the ONE-IN-THREE SATISFIABIL- ITY problem is NP-complete.

The simi lar i ty between this problem and the standard sa t i s f iab i l i t y problem for propositional formulas in conjunctive normal form leads to the generalization which is the subject of this paper. Consider the problem of deciding whether a given CNF formula with 3 l i te ra ls in each clause is satisfiable --.a well-known NP-complete problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

*Author's current address: CALMA, 527 Lakeside Drive, Sunnyvale, CA 94086

Since a clause may contain any number of negated variables from 0 to 3, there are four dist inct relations among variables which occur as conjuncts in the formulas of this problem -- namely, the re- lations Ro,R I,R~,R 3 defined by Ro(x,y,z) ~ x v y v z , R1(x,y,z) ~ I x ~ y v z , R2(x,y,z) ~ ~ x v ~ y v z a6d R3(x,y,z) z ~ x v ~ y v ~ z . An input to this sat is f iab ia l i ty problem is just a conjunction of clauses of the form Ri(~,~',~") for various vari- ables ~,~',~" and various i ~{0,I,2,3}.

This sets the stage for the following general- ization. Let S = {R l . . . . . R m} be any f in i te set of logical relations. (A logical relation is defined

to be any subset of {O,l} k for some integer k~l. The integer k is called the rank of the relation.) Define an S-formula to be any conjunction of clauses, each of the form ~i(~l,~2 . . . . ), where ~l,~p . . . . are variables whose number matches the r~nk-o fR i , iE{ l . . . . . m}, and R~ is a relation sym- bol representing the relation"'R i . The S-satisf i- ab i l i t y problem is the problem of deciding whether a given S-formula is satisfiable. We denote by SAT(S) the set of al l satisfiable S-formulas.

The main result of this paper characterizes the complexity of SAT(S) for every f in i te set S of logical relations. The most striking feature of this characterization is that for any such S, SAT(S) is either polynomial-time decidable or NP-complete. This dichotomy is somewhat surprising, since one might expect that any such large and di- verse class of problems, that includes both polyno- mial-time decidable and NP-complete members, would also contain some representatives of the many in- termediate degrees of complexity which presumably l ie between these two extremes.

Furthermore, we give an interesting c lass i f i - cation of the polynomial-time decidable cases. We show that (assuming P~NP) SAT(S) is polynomial-time decidable only i f at least one of the following conditions holds:

(a) Every relation in S is satisfied when al l variables are O.

(b) Every relation in S is satisfied when al l variables are I.

(c) Every relation in S is definable by a CNF formula in which each conjunct has at most one negated variable.

(d) Every relation in S is definable by a CNF formula in which each conjunct has at most one unnegated variable.

- 216 -

Page 2: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

(e) Every relation in S is definable by a CNF formula having at most 2 l i terals in each conjunct.

(f) Every relation in S is the set of solutions of a system of linear equation over the two- element f ie ld {0,1}.

Sections 2-4 are devoted to the statement and proof of this Dichotomy Theorem. (Although we use the word "dichotomy" to describe this result, i t should be borne in mind that the dichotomy holds only i f P#NP; i f P=NP, the dichotomy would col- lapse.)

A variation of the problem consists of allow- the constants 0 and l to occur in input formulas (e.g. a clause R(x,O,y) is allowed). We denote this "satisfiability-with-constants" problem by SATc(S)~ Our results for SATe(S) are sharper than for SAT(S): we obtain a compTete characterization up to log-space equivalence. For any f in i te set S of logical relations, SATc(S) lies in one of seven log-space equivalence classes, described as follows:

I. SATc(S) is decidable deterministically in log space.

2. The complement of SATe(S) is log-equivalent to the graph reachabilit~ problem (given a graph G and nodes s,t of G, do s and t l ie in the same connected component of G?).

3. The complement of SATe(S) is log-equivalent to the digraph reachabil~ty problem (given a direc- ted graph G and nodes s,t of G, is there a di- rected path from s to t?). In this case, SATc(S) is log-complete in co-NSPACE(Iog n).

4. SATc(S) is log-equivalent to the problem of deciding whether a graph is bipartite.

5. SATc(S) is log-equivalent to the problem of whether an arbitrary system of linear equations over the f ield {0,]} is consistent.

6. SATc(S) is log-complete in P.

7. SATc(S ) is log-complete in NP.

This result is presented in Section 5. For "most" sets S, SATc(S ) is essentially identical to, and has the same complexity as, SAT(S). (See Lemma 4.2.) Of course, i t is not known that the above seven classes are distinct.

In Section 6, we present a polynomial-space analogue of the Dichotomy Theorem, involving quan- t i f ied formulas. We define QFc(S) to be the analog of SATc(S) in which formulas contain universal and existential quantifiers quantifying the proposition- al variables. The main theorem of this section states that for any f in i te set S of logical rela- tions, QFc(S ) is either polynomial-time decidable or log-complete in polynomial space. For both QFc(S ) and SATc(S) the polynomial-time decidable ca~es are just-the cases (c)-(f) listed above; cases (a) and (b) are excluded.

We men~ion here a few particular completeness results whic~follow from these general theorems. Problems NPI,NP2 and NP3 are NP-complete.

NPI. ONE-IN-THREE SATISFIABILITY

Given sets S l . . . . . S_ each having at most 3 mem- bers, is there a subset T of the members such that for each i , ITnSiJ = l ?

NP2. NOT-ALL-EQUAL SATISFIABILITY

Given sets S., . . . ,S m each having at most 3 members, canlthe meJiibers be colored with two colors so that no set is all one color?

NP3. TWO-COLORABLE PERFECT ICATCHING

Given a graph G, can the nodes of G be colored with two colors so that each node has exactly one neighbor the same color as itself? (G may be restricted to be planar and cubic.) (Theorem 7.1)

Problems PI and P2 are log-complete in P.

Pl.

P2.

SAT3W (Weakly Positive Satisf iabi l i ty)

Given a CNF formula having at most 3 l i terals in each clause, and having at most one negated variable in each clause, is i t satisfiable? (Corollary 5.2)

NOT-EXACTLY-ONE SATISFIABILITY

Given sets S I . . . . . S m each having at most 3 members, and'a distTnguished member s, can one choose a subset of the members, containing s, so that no set has exactly one member chosen? (Corollary 5.2)

This paper contains a ful l proof of the Dichotomy Theorem. The other results are, for the most part, stated without proof.

Technical Note. The definition of "logical relation" given above is deficient in that i t fa i ls to d i f fer- entiate between empty relations of differing ranks. Therefore, we formally define a logical relation to be a pair (k,R) with R~{O,1}k; but informally we shall continue to regard R i t se l f as being the relation.

2. THE DICHOTOMY THEOREM

This section states and discusses the main result of this paper, the following theorem.

Theorem 2.1. (Dichotomy Theorem for Sat isf iabi l i ty) . Let S be a f in i te set of logical relations. I f S satisfies one of the conditions (a)-(f) below, then SAT(S) is polynomial-time decidable. Otherwise, SAT(S) is log-complete in NP. (See below for definitions).

(a) Every relation in S

(b) Every relation in S

(c) Every relation in S

(d) Every relation in S

(e) Every relation in S

(f) Every relation in S

is O-valid.

is l-valid.

is weakly positive.

is weakly negative.

is affine.

is bijunctive.

Definitions. (The following definitions were all invented for this paper and should not be assumed to agree with terminology used elsewhere.)

The logical relation R is O-valid i f (0 ..... O) E R. The logical relation R is l-valid i f (I . . . . . I)~R.

- 217 -

Page 3: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

The logical re lat ion R is weakly posi t ive (resp. weakly negative) i f R(x I . . . . ) is l og ica l l y equivalent to some CNF f o r ~ l a having at most one negated (resp. unnegated) variable in each con- junct.

The logical re lat ion R is bi junct ive i f R(x I . . . . ) is l og ica l l y equivalent to some CNF for- ~ul~ having at most 2 l i t e r a l s in any conjunct.

The logical re lat ion R is af f ine i f R(x I . . . . ) is log ica l l y equivalent to somesy--~m o f - l inear equations over the two-element f i e l d { 0 , I } ; that is, i f R(x~ . . . . ) is l og ica l l y equivalent to a con- junctio~ o~ formulas of the forms ~I ~ " " ~Cn = 0 and ~l ~ ' ' ' ~ n = l , where ~ denotes addition modulo 2.

Complexity-theoretic notions, such as P, NP, log-space reduc ib i l i t y , etc. are defined b r i e f l y in the Appendix.

Exampl es

The relat ion RI = { ( l ,O ,O ,O) , (O , l , l ,O ) , (O , l ,O , l ) , ( I , 0 , I , I ) } is aff in& s i nce~ i ( u , x , y , z ) is equival- ent to ( u ~ x : l ) A ( x ~ y m z : O } .

The re lat ion R 2= { ( 0 , 0 , 0 ) , ( 0 , 0 , I ) , ( 0 , I , 0 ) , ( I , I , 0 ) } is b i junct ive and weakly negative, since ~2(x,y,z) is equivalent to ( - I x V y ) A ( 4 y v - l z ) . I~ is also, obviously, O-valid.

The re lat ion R 3 = { ( 0 , I ) , ( I , 0 ) } is defined by the formula ( x v y ) ~ ( ~ x v ~ y ) , or equivalent ly , x ~ y = I. Hence this re lat ion is b i junct ive and af f ine. I t is not, however, weakly posi t ive or weakly negative - - this can be shown using Lemma 3.1W.

The re lat ion R 6 = { ( 0 , 0 , 0 ) , ( I , I , I ) } is defined by the formula (xmy) A(ymz), or equivalent ly, ( x v ~ y ) A ( y V I x ) A ( y v ~ z ) A ( Z ~ - I y ) . Hence, i t is O-valid, l - v a l i d , weakly posi t ive, weakly nega- t i ve , a f f ine and bi junct ive.

The relat ion R~={ (O,O, I ) , (O , I ,O) , (O , I , I ) , ( I ,O ,O) , ( I ,O , I ) , ( I ,T ,O) } is the complement of R 4. I t does not have any of the six properties l i s ted for R 4 -- this can be proved using Lemmas 3.1A, 3.1B, and 3.1W. Thus, this example shows that none of these properties is preserved under complement.

The re lat ion R 6 = { ( 0 , 0 , I ) , ( 0 , I , 0 ) , ( I , 0 , 0 ) } is the re lat ion "exact ly one of three" mentioned in the Introduction. I t can be shown, using Lemmas 3.1A, 3.1B and 3.1W, that i t is not weakly posi t ive, not weakly negative, not af f ine and not b i junct ive.

By applying Theorem 2.1 with S={R 5} and S={R6} respect ively, i t can be deduced that tee NOT-ALL -v EQUAL and ONE-IN-THREE s a t i s f i a b i l i t y problems, defined in Section I , age log-complete in NP. We omit the proofs.

Method of Proof

The key question on which the proof of the Dichotomy Theorem centers is : For a given S, what relat ions are definable by e x i s t e n t i a l l y quantif ied S-formulas? For example, is S={R}, where R is the re lat ion "exactlY one of x ,y ,z , " then the existen- t i a l l y quantif ied S-formula (3Ul,U2,U3)(R(X,Ul,U3) AR(y,up,uR)A~(Ul,U2,Z)) defines the re lat ion { (T , I , I ) , (T ,O,O) , (O, I ,O) , (O,O, I ) } , which in the notation of Section 3 could be wri t ten [ x ~ y ~ z = l ] .

Moreover, for this par t icu lar S, i t turns out that every logical re lat ion is definable by some exis- t e n t i a l l y quanti f ied S-formula. This fact readi ly implies the NP-completeness of SAT(S).

Another way to state th is fact is as a closure property: The smallest set of re lat ions which con- tains S and is closed under certain operations (con- junction and ex is tent ia l quant i f icat ion) is the set of a l l logical re lat ions. From this point of view, the general problem can be phrased as f o l - lows: What sets of logical re lat ions are closed under these operations? I f we can obtain a reason- ably succinct c lass i f i ca t ion of the sets of re la- tions that are closed in th is way, then this may serve as a basis for c lass i fy ing the complexity of SAT(S) for various S.

We do in fact obtain a c lass i f i ca t ion theorem along these l ines. Section 3 is devoted to i t s statement and proof, and a refinement of i t is given in Section 5. This theorem c lass i f ies the sets of logical relat ions that are closed under composition, subst i tut ion of constants for var ia- bles, and ex is ten t ia l quant i f icat ion. Although the c lass i f i ca t ion is not so thorough as to give a com- plete enumeration of the sets having th is closure property, i t does permit the complexity of the corresponding s a t i s f i a b i l i t y problem to be deter- mined up to log-space equivalence in a l l cases.

The closure of the set S under these three operations is denoted Rep(S). I t is interest ing to note that the corresponding s a t i s f i a b i l i t y - w i t h - constants problem, SATc(S), is NP-complete just when Rep(S) is the set of a l l logical re lat ions. Thus, NP-completeness is c losely t ied to a kind of functional completeness. (In Section 3 of [Sch] we observed and exploited a s imi lar " logical com- pleteness property" which is probably exhibited in some form by a l l known NP-complete problems.)

Relation to Ear l ier Work

The work presented here is s imi lar in s p i r i t to the c lass i f i ca t ion by Post [P] of the sets of logical functions that are closed under functional composition. In both cases, i t is shown that "func- t ional completeness" holds provided that the gener- ating set is not included in one of a f i n i t e number of rest r ic ted classes of functions or re lat ions. But the generating operations are quite d i f fe ren t , and to the best of our knowledge, none of the par- t i cu lars of Post's proof carry over to this work.

Our generalized s a t i s f i a b i l i t y problem em- braces, as par t icu lar cases, a number of previously studied problems. Of the NP-complete cases, so far as we know, only the standard CNF s a t i s f i a b i l i t y problem with 3 l i t e r a l s per clause has appeared in the l i t e ra tu re [C]. Of the polynomial-time decid- able cases, a l l are e i ther t r i v i a l or previously known. The s a t i s f i a b i l i t y problem for weakly nega- t i ve formulas is essent ia l ly ident ical to the prob- lem called UNIT which is shown to be complete in P in [JL]. A rest r ic ted form of weakly posi t ive s a t i s f i a b i l i t y is equivalent (under complement) to the digraph reachabi l i ty problem, a complete prob- lem in nondeterministic log space [Sav]. Our work makes use of a l l these ea r l i e r completeness re- sul ts.

- 2 1 8 -

Page 4: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

3. CLASSIFICATION OF LOGICAL RELATIONS

This section presents the c l a s s i f i c a t i o n the- orem (Theorem 3.0) which is the essent ia l part of the proof of the Dichotomy Theorem. This theorem c l a s s i f i e s the sets of log ica l re la t ions that are closed under cer ta in operations (conjunction, sub- s t i t u t i o n of constants for var iab les , and ex is ten- t i a l quan t i f i ca t i on ) , showing that any such set consists exc lus ive ly of re la t ions which are in one of the four classes weakly pos i t i ve , weakly nega- t i v e , a f f i ne or b i j unc t i ve , or else is the set of a l l log ica l re la t ions .

A key part of the proof, which is also of in - dependent i n te res t , is a series of lemmas (Lemmas 3.1A, 3.1B and 3.IW) which characterize these four classes of re la t ions in semantic terms, that i s , in terms of what elements are in the re l a t i on , rather than in terms of def in ing formulas as in the d e f i n i t i o n s .

The resu l ts of th is section deal purely with log ica l re la t i ons ; no complex i ty- theoret ic notions are involved.

Def in i t ions

The d e f i n i t i o n of S-formula was given in Sec- t ion I . We use the term formula in a larger sense, to mean any well-formed formula, formed from v a r i - ables, constants, log ica l connectives, parentheses, log ica l re la t i on symbols and e x i s t e n t i a l and uni- versal quan t i f i e rs - - the in ten t here is to in - clude whatever notat ion is handy for expressing a re la t ion among proposi t ional var iab les.

To c l a r i f y these terms: (a) A var iab le , fo r purposes of th i s paper, is an element of the set {X,Xn,Xl . . . . . Y,Yn,Yl . . . . ,z ,zn, . . . . U,Uo . . . . , v , v n , v l , . . . } T Variables~ l i ke formuIas, are s t r ings of symbols; and we construe, e .g . , the var iab le x18 to be a s t r ing of length 3. (b) A constant is one of the symbols 0, I ( l= t rue , O=false~7--( -~A log- ical connective is one of the symbols - 1 , ^ , v , + , z , ~ which have t h e i r usual meanings of "not" , "and", "o r " , " imp l ies" , "equals", "does not equal." (d) Each log ica l re la t i on R has associated with i t a log ica l re la t ion symbol, denoted ~, as in Section I . (e) The quan t i f i e rs (3x) and (Vx) are i n t e r - preted to mean " fo r some x e { 0 , I } " and " fo r a l l x ~ { 0 , I } " .

A l i t e r a l is a var iab le or a negated var iab le , i . e . , ~ or - ~ fo r some var iab le ~.

The notat ion R(xl . . . . ) is shorthand fo r ~(x I . . . . . x k) where~k ~s the rank of R.

I f A is a formula, then Var(A) denotes the set of f ree ( i . e . , u n q u a n t i f i e d - ~ r i a b l e s occur- r ing in A. An assignment for A is a funct ion s : V a r ( A ) + { O , l } . We say the assignment s s a t i s f i e s A i f s makes A true under the usual rules of i n t e r - p re ta t ion .

We define Sat(A) to be the set of a l l assign- ments s : V a r ( A ) + - ~ which sa t i s f y A. Two formu- las A and B are l o g i c a l l y equivalent i f Var(A) = Var(B) and Sat(A)=Sat(B) .

Let A be a formula, V~Var(A), and i ~ { 0 , I } .

Then K A denotes the assignment s : V a r ( A ) ÷ { O , l } i ,V defined by s(~):i i f f ~EV. Usually we write just K: ,, and let the domain be inferred from context. K!'Vdenotes the assignment which has the constant v~lue i; again the domain is inferred from context.

I f A is a formula, ~ is a variable, and w is

a l i te ra l , then A[#] denotes the formula formed from A by replacing each occurrence of ~ by w. I f

V is a set of variables, then A[~] denotes the re- sult of substituting w for everyWoccurrence of every variable in V. Multiple substitutions are

V' V" denoted by expressions such as A[~,w,,w,, ] with obvious meaning.

The set of existential ly quantified S-formulas with constants is denoted Gen(S). Specifically, Gen(S) is the smallest set of formulas such that (a) fo r a l l R~S, ~(x I . . . . ) ~Gen(S), and (b) fo r a l l A,B~Gen(S) and a l l var iables ~,n, the fo l low- ing are all in Gen(S): A~B, A[~], A[~], A[f] and (3~)A.

Gen+(S) denotes the set of all formulas which are logically equivalent to some formula in GentS).

I f A is a formula, then we denote by [A] the logical relation defined by A, when the variables are taken in lexicographic order. For example, [ z ~ ( x v y ) ] is the 3-place relation { (0,0, I ) , (0 , I ,0) , (l ,o,o) , ( i ,l ,o)}.

F ina l l y we define Rep(S) := { [A] : AeGen(S)}. Rep(S) is the set of re la t ions that are "represent- able" by quant i f ied S-formulas with constants. Ob- serve that i f S~S ' , then Rep(S)~Rep(S').

C lass i f i ca t i on Theorem for Rep(S)

Theorem 3.0. Let S be any set of log ica l re la t ions . I f S s a t i s f i e s one of the condit ions (a) - (d) below, then Rep(S) s a t i s f i e s the same condit ion. Other- wise, Rep(S) is the set of a l l log ica l re la t ions .

(a) Every relation in S is weakly positive. (b) Every relation in S is weakly negative. (c) Every relation in S is affine. (d) Every relation in S is bijunctive.

The remainder of th is section is devoted to the proof of Theorem 3.0.

Lemma 3.1A. Let R be a log ica l re la t i on and l e t ~I~ . . ) . Then R is a f f i ne i f and only i f

A:=for I s i , s 2 , s 3 c S a t ( A ) , S l~S2~S3~Sat (A) .

Proof. We use the fo l lowing fac t , which can be proved using elementary l i nea r algebra. I f K is a f i e l d , then a subset D~K n is the solut ion set of a system of l i nea r equations over K i f f for a l l bl,b2,b3 ~D and a l l ci 2 3 , c ,c ~K with ci 2+c +c =I , Clb I +c2b2 +c3b 3eD. In case K is the f l e l ~ { 0 , I } , th i~ condit ion is equivalent to "the sum of any three elements of D is in D." Since R is a f f i ne i f f A is equivalent to a system of l i near equations over { 0 , I } , the lemma fol lows from th is f a c t . [ ]

Remark. The c a r d i n a l i t y of an a f f i ne re la t ion is always a power of 2. (This fol lows from standard resu l ts in l i near algebra.) This fac t is often of use for showing that a re la t i on is not a f f i ne .

We now define some terminology for the next lemma.

I f ~ is a va r iab le , we use the notation <~,i> to denote the l i t e r a l ~ i f i : l and ~C i f i=O. As is customary, ~ denotes the complementary l i t e r a l of ~, that i s , the l i t e r a l <~ , l - i > where ~=<~, i> . We say the l i t e r a l ~=<~ , i> is consistent with a formula A i f s(~)=i fo r some s~Sat (A) . We say the

- 219 -

Page 5: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

assignment s a~rees with the l i te ra l ~ i f e= <C,s(~)> for some variable ~. A set of l i te ra ls is consistent i f i t does not contain ~ and ~ for any l i te ra l ~.

I f s is an assignment and Q is a consistent set of l i te ra ls , we denote by s#Q the assignment which differs from s just on the set {~ : <~,l-s(~)> ~ Q}.

Let e be a l i te ra l and A a formula. We define ImPA(e) to be the set of a l l l i te ra ls B such that every s ~ Sat(A) which agrees with ~ also agrees with 8. Thus, ImPA(e) is the set of l i te ra ls which are "implied" by tSe l i te ra l e. For example, i f A= x v y , then <y,1> ~ ImPA(<X,O>).

Let A be a formula ~nd s ~Sat(A). A chan~e set for (A,s) is any set V~Var(A) such that sBK] V eSat(A)" That is, V is a change set for (A,s)' i f f the assignment which differs from s just on V satisfies A.

Lemma 3.1B. Let R be a logical relation and le t A:=B(x I . . . . ). Then the following are equivalent:

(a) R is bijunctive.

(b) For every s ~ Sat(A), i f V l and V 2 are change sets for (A,s) then so is VImV 2.

(c) For every s ~ Sat(A) and every l i te ra l which is consistent with A, s#ImPA(e ) ~Sat(A). (See Note on last page of this section.)

Proof.(a)~(b-T?-. Assume R is bijunctive. Thus, A is logically equivalent to some formula B which is a conjunction of clauses of the form (~+B), where ~,B are l i te ra ls . Let s~Sat(A) be given, and let VI,V2 be change, sets for (A,s). Let Q be the small- est set of l~terals such that (a) {<n,l-s(q)> : q~VlnV ~}eQ, and (b) whenever ~ Q and (~+8) or (B ÷ ~) is a conjunct of B for some l i te ra l 8, then 8~Q. Clearly, any assignment t~Sat(B) which d i f - fers from s on al l of Vlf%V 2 must agree with every l i te ra l in Q. Since S®Kl,Vl is such an assign-

ment, Q is consistent and Q cannot contain any l i t - eral <n,l-s(n)> with n~V~. Similarly, Q cannot contain any l i te ra l <n,l-~(n)> with n~V 2. Hence, s#Q = s®KI,v, nV " I t is straightforward to show

that s#Q sati~fie~ every conjunct of B; hence S®Kl, V nV ~Sat(B) = Sat(A). Hence V lmV2 is a

1 change set ~or (A,s).

(b)-->(c): Assume that (b) holds. Let s ~Sat(A), and let ~ be a l i te ra l consistent with A. We want to show that s#ImPA(e ) ~ Sat(A). AssumeTthat s dis- agrees with ~; that is, ~ =<~,1-s(~)> for some variable ~. Let W:={n : <n,l-s(n)>~ImPA(~)}; that is, W is the set of variables on which ImPA(~ ) clashes with s. We claim that W =( ' ] {V:V is a change set for (A,s) & ~ V } . To prove this claim, f i r s t note that any change set containing ~ must also contain al l of W, since W consists of a l l var- iables which are forced to change as a result of changing ~. On the other hand, i f some variable n is not contained in W, then there is some assign- ment t such that t (~)~s(~) but t (n )=s(n ) , so that n is not contained in the change set {~ : t (~ )~s(~) } . This proves the claim.

Now by mul t ip le appl icat ion of hypothesis (b) , W is i t s e l f a chan~e set fo r (A,s) . Thus, s8K 1 = s#1mPA(e) e Sat(A), as was to be shown. ,W

(C)-->(a): Assume that for every s ~Sat(A) and every l i t e r a l e which is consistent with A,

s#ImPA(~) eSat(A). Define B to be the conjunction of {(~+B): ~,B are l i te ra ls & B~ImpA(~)}. Note that Var(B)=Var(A), since B has the c~njunct ({÷~) for each ~Var(A). We claim that B is logically equivalent to A, and hence R is bijunctive.

We must show that Sat(B)=Sat(A). Clearly, Sat(A)~Sat(B), since any assignment satisfying A must satisfy each conjunct of B.

I t remains to show Sat(B)~Sat(A). Suppose, for sake of contradiction, that s I ~ Sat(B) - Sat(A). Choose s~ E Sat(A) such that IwI is maximum, where W = {n :~s1(n)=sp(n) • Choose CEVar(A)-W, and let

:=<~,Sl(~)>. ?he l i te ra l ~ is consistent with A, because i f ~ were inconsistent with A, then B would have a conjunct asserting this fact (that is, i f

= <~,0> is inconsistent with A, then B has a con- junct ( - ~ ÷ ~ ) , and i f ~=<~,l> is inconsistent with A, then B has a conjunct ( ~ ÷ - ~ ) ) , and this would force s2(~)=Sl(~). Let sR:= sp#ImPA(~). By hypothesis s 3 satisfies A. - -

We claim that for a l l n~W, s3(n)=s2(n). To see this, suppose nEW with sR(n)#sp(n). Since now <n,l-s2(n)>~ ImPA(e), B has a~conjuBct (e÷<n,l-s2(q)>), or equivalently (<~,s1(C)> + <q,l-sl(n)>). This conjunct is not satisfied by s l , contradicting the assumption that s I satisfies B. ' This proves the claim.

Thus, s3 agrees with s I on al l of Wu{~}. This contradicts the fact t~at s 2 was chosen to maximize IwI .. The contradiction completes the proof.. [ ]

Let A be a formula, let i E {O,l}, and let V~Var(A). Define the i-closure of V with respect to A to be the set Cli,A(V-[:={~cVar(A) : for a l l

seSat(A) such that s lV~ i , s(C)=i}. In other words, Cln A(V) (resp. Cll.A(V)) is the set of var iables~ich are forced ~o be false (resp. true) by al l variables of V being false (resp. true).

I t is easy to see that V~Cll A(V), and that V~V' implies Cl~ A(V)~Cl i A(V')~'-for a l l V,V'

Var(A), i ~{o, i}? Call t6e set VSVar(A) i-closed for A i f V=CI i A(V). Also, call V

i,consistent for A i f £Here is some s ~ Sat(A) such that s IVs i . We say V is i-nonclosed (resp. i-inconsistent) for A i f V is not i-closed (resp. i-consistent) for A.

Lemma 3.1W. Let R be a logical relation and let A:= R(x~ . . . . ). Then (a) R is weakly positive i f and ~ l y ' i f whenever V~Var(A) is O-consistent and O-closed for A, K n v~Sat(A); and (b) R is weakly negative i f and o~I# i f whenever V~Var(A) is l-co~- sistent and l-closed for A, Kl, V~Sat(A).

Proof. We just prove part (a). The proof of (b) is similar. I f R is empty, the lemma holds t r i v i a l l y , so assume R is nonempty.

( ~ ) : Assume that R is weakly positive. Thus A is logically equivalent to some CNF formula A' having at most one negated variable per conjunct. I t suf- fices to show that i f V&Var(A') is O-consistent and O-closed for A °, then KO, V~Sat(A'). Let V be such a set and suppose to t5e contrary that Kn v Sat(A'). Let C be a conjunct of A' on which ~'" Kn v fa i ls . Let U be the set of u~negated variables o~"C. Since KO, V fa i ls on C, U&V. I f C has no . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

TIf s agrees with ~, the conclusion is immediate, since then s#1mPA(~ ) = s.

- 2 2 0 -

Page 6: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

negated variable, then this contradicts the fact that V is O-consistent for A'. Otherwise, let n be the unique negated variable of C. I t can be seen that nECln a,(U). Also, since K n V fails on C, n~ V. This £(~h'tradicts the fact th~£ V is O-closed for A'. This proves that in fact KO, V ~ Sat(A').

(~ ) . Assume that K n vESat(A) for all O-closed, O-consistent sets V_~?ar(A). Let A' be the con- junction of all the clauses { (C lV ' "V~n) : {~I . . . . . ~n} is O-inconsistent for A } ~ { ( , q v ~ ] v • ..V~n) '." ~leCl 0 A({~I . . . . . ~n})}. Since every" variabl'eis contained ~n its own O-closure, A' has a conjunct ( -~v~ ) for each ~Var(A); hence, Var(A')=Var(A). We claim that A' is logically equivalent to A.

To show Sat(A)~_Sat(A'), suppose that s~Sat(A'). Let C be a conjunct of A' on which s fai ls. I f C is of the form (~ lV. . .V~n), then s(~)=.. . :s(~ n) =0 and, by the de~=inition of A', {~ , ' " . ,~n } is. O-inconsistent for A; hence, s~Sat(A). Other- wise C is of the form ( -~nv~v . . .V~n ) , and so s(n)=l, s(~)=...=S(~n)=O. T~en by the definition of A' , n~C~n A({~I . . . . . ~ } ) , hence s~Sat(A). This proves £6at Sat(A):_,"Sat(A').

Next we show that Sat(A')_~Sat(A). Suppose s~Sat(A). Let V : : {~ : s(~)=O}. By the property assumed for A, V is e i ther O-inconsistent of O-non- closed for A. I f i t is O-inconsistent, then ( C ~ v . . . v ~ ) is a conjunct of A ' , where V = {~1, • - ; ,~n} , an~ hence s~Sat (A ' ) . I f V is O-non- closed, le t n~Cln A(V) - V. Then ( - ~ n v ~ i v . . . v ~n) is a conjunct~'~of A' , and hence s#Sat (A ' ) . THis proves that Sat(A')_~Sat(A).

Thus Sat(A)=Sat(A') and so A' is log ica l l y equivalent to A. Hence R is weakly posi t ive. [ ]

Lemma 3.2. Let R be a logical re lat ion. I f R is not weakly negative, then Rep({R})m{[x~y],[xvy]}

@. I f R is not weakly positive, then Rep({R})m { [ x~y ] , [ ' ~XV-~y ] } / @.

Corollary 3.2.1. I f S contains some re lat ion which is not weakly posit ive and some re lat ion which is not weakly negative, then [x~y] ~ Rep(S).

Proof of Corollary. Assume R,R' ~S with R not weakly posi t ive and R' not weakly negative. Sup- pose, for sake of contradict ion, that [x~y] Rep(S). Then, by Lemma 3.2, [ x v y ] and [ , x V ~y ] are in Rep(S). Hence, Rep(S) contains [ ( x v y ) ^ ( ~ x v - ~ y ) ] , which is jus t [x~y], contrary to assumption. [ ]

Proof of Lemma 3.2. Let R be a logical re lat ion which is not weakly negative, and le t A:=R(x I . . . . ). By Lemma 3.1W, there is a set V-~Var(A) which is l - cons!stent" and l-closed such that:_ K~,,V ~.Sat(A)" Let V := Var(A)- V. Choose W-V ,of maximum cardi- na l i t y such that K n w eSat(A). I t can be seen from the de f in i t i on { that I<IWI<IV I. Choose

eV-W, and le t s be an assignment sat isfy ing A such that s l V - I and s(~)=O. Such an s exists be- cause V is l-closed and l -consistent .

For j=O,l define W i := {new : s(n)= j } , Wi : : {n~ Var(A)-W : s~(~):j }, and let

:= A[ Wo "l "o "X ' y '

W 1 is nonempty by maximality of IWI. WO is non-

empty since i t contains ~. Thus x and y both actual- ly occur in B.

Clearly [B]~Rep({R}). Also, [B] contains (O,l) and (I,0), because A is satisfied by K 0 w and s respectively. And [B] does not contain (0,07, by maximality of IWI. Thus, depending on whether (1,1) is in [B], [B] is either [x~y] or [ xvy ] . This proves the lemma for the case where R is weakly negative.

The proof of the weakly positive case is sim- i lar. [ ]

The following lemma is of frequent use in what follows.

Negated Substitution Lemma. Assume [x~y] ~ Rep(S). Then Gen+(S) is closed under negated substitution; that is, i f A~Gen+(S) and ~,q are variables,

A[~n] ~ Gen+(S).

Proof. By hypothesis Gen+(S) contains the formula x~y Observe that ~ ~r • A[ ,n] ,o log ica l l y equivalent to (3u)(A[~]An~u),~where u is a variable not occurring in A. ~Hence,A[~n]~Gen+(S). [ ]

By a "3-element binary logical re la t ion" we mean a 2-place logical re lat ion having exactly 3 elements• I t is easy to ver i f y that there are exactly four such re la t ions, and that these are [ xVy ] , [~ xVy] , [x~ ~y] and [4 xV~y ] .

Lemma 3.3. Let R be a relation which is not affine. Then Rep({R ,[x~y]}) contains all 3-element binary logical relations.

Proof. I t suffices to show that Rep({R,[x~y]}) con- tains some 3-element binary relation, since the others can then be obtained by use of the Negated Substitution Lemma.

Let A: : ~!x I . . . . ). Using Lemma 3.1A, let So,Sl,S 2 be asslgnments satisfying A such that SnSSl 8sR does not satisfy A. Form A' from A by n~gating All occurrences of variables in the set {n : So(n)=l}. By the Negated Substitution Lemma, A'~Gen+({R,[x~y]}). Define si' := s iSsQ, for i=l,2. Observe that an assignment t 'sat isf ies A' i f f tSs o satisfies A. Thus, K 0 (the all-zero as- signment}, Sl' , s 2' all satisfy A', but Sl'8 s{ does not.

For i , j = O,l, let V~ ~ :={C~Var(A') : s I' (~):i & s~ (C)=j}, and"~let

~ vl,o V~,l] B := A [ 0 , 0 , 0 , I , y ' z "

Clearly, BeGen+({R,[x~y]}). Assume without loss of general i ty that x ,y ,z a l l actual ly occur in B. (For example, i f x does not occur, one can add a conjunct (3w)(w~x) just to make i t occur.)

By the statement just made about satisfaction of A', [B] contains (O,O,O),(O,l,l) and ( l ,O, l ) , but not ( l , l ,O).

Assume, for sake of contradiction, that Rep({R,[x~y]}) does not contain any 3-element bin- ary relation. Then [B] must contain (O,l,O), or else [(3x)B] is {(O,O),(] , ]) ,(O,l)} . Also, [B] must contain (I,0,0), or else [(3y)B] is{(O,O), (0, ] ) , ( I ,1)} . But then [B[Z] (l,O)}, and this contradictiSn ]completesis {(O,O),(O,l),the proof. [ ]

- 2 2 1 -

Page 7: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

Lemma 3.4. Let R be a logical relation which is not bijunctive. Then Rep({R,[x~y],[xvy]}) con- tains the relation "exactly one of x,y,z."

Proof. Let A :=B(x I . . . . ). By part (b) of Lemma there exist s nCSat(A) and U,V~Var(A) such

that U and V are change sets for CA,s), but UmV is not a change set for (A,s).

Form A' from A by negating all occurrences of each variable in the set {n :,Sn(n)=l}. By the Negated Substitution Lemma, A ~Gen+({R,[xly]}). Observe that U and V are change sets for (A',Ko), where KO is t~e all-zero assignment, but UnV is not. (Also, K 0 satisfies A'.) Define

Var(A')-(UuV) U~V U-V V U B := A ' [ 0 ' x ' y ' z ]

By the above remarks about change sets for (A',Kn), [B] contains (0,0,0), ( l , l ,O) and ( l ,O, l ) , but n~t (l,O,O). Now define

B' := B[Xx] ^ ( I x v 1 y ) A ( l y v ~ z ) A ( ~ z v 1 x )

By the Negated Substitution Lema, B' ~Gen+({R, [x~y] , [xvy ] } ) . I t is easy to check that [B'] = {(l,O,O),(O,l,O),(O,O,l)}. That is, [B'] is the relation "exactly one of x,y,z." [ ]

Lemma 3.5. Let R be the logical relation "exactly one of x,y,z." Then Rep({R}) is the set of all logical relations.

Proof. Define

A := (3Ul,U2,U3,U4,U5,U6)(R(X,Ul,U4)AR(y,u2,u4)

^ R(Ul,U2,U 5) ̂ R(u3,u4,u 6) ^ R(z,u3,0)

B := R(x,y,O)

I t is straightforward to verify that A is logical- ly equivalent to ( xvyvz ) and that B is logically equivalent to x~y.

Let a logical relation Q be given and let Q=[C] for some standard propositional formula C. By introducing a new existential ly quantified var- iable for each binary logical connective of C, one can form a formula (3y I . . . . . Ym) D, equivalent to C, where D is a conjunction of cTauses each involving at most 3 variables (and hence D can be expanded to CNF form with at most 3 l i terals per conjunct.) Details of this process can be found in [St,Lemma 6.4] or [BBFMP]. I t is now straightforward, using the formulas A and B, to convert D into an equivalent formula in Gen({R}). I t follows that Q E Rep({R}).

Thus Rep({R}) is the set of all logical rela- tions. [ ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Note. Condition (b) of Lemma 3.1B can also be ex- pressed in the following pleasantly symmetric form:

(b') For all sj,s2,s R~sat(A), (SlYS2) A(s2vs3) ^ (s3vs I) ~ Sa~(A).

This is derived from condition (b) by setting Sl=S, s 2=sCKI,V~, sR=sCKi,vo and observing that (Ts2 ~ s l) ^ (s3~sl) I e s I : is equivalent to (SlY s 2) m(s2vs3)~(s3vs I .

Proof of Theorem 3.0. First we show that i f S does not satisfy any of the conditions (a)-(d) of Theorem 3.0, Rep(S) is the set of all logical rela- tions.

Assume that S does not satisfy any of (a)-(d). Then S contains some relation R l which is not weak- ly positive, some relation R 2 which is not weakly negative, some relation R 3 wBich is not affine, and some relation Rm which is not bijunctive. By Corollary 3.2.1, [x~y]~Rep({Ri,Rp}). Now by Lemma 3.3, [xVy]~Rep({RI,R~,RR} ~. Hence, by Lemma 3.4, Rep({RI,R?,R3,R4}) c6ntains the rela- tion "exactly one'of-x,~,z~" and hence is the set of all logical relations, by Lemma 3.5. Thus, Rep(S) is the set of all logical relations.

I t remains to show that i f S satisfies one of the conditions (a)-(d), so does Rep(S). The proof of this part does not involve any new techniques, and we leave i t as an exercise for the reader. (This part is not needed in the proof of the Di- chotomy Theorem.) [ ]

4. PROOF OF DICHOTOMY THEOREM

This section finishes the proof of the Dichot- omy Theorem (Theorem 2.1).

Lemma 4.1. (Dichotomy Theorem for Sat is f iabi l i ty- with-Constants) Let S be a f in i te set of logical relations. I f S satisfies one of the conditions (a)-(d) of Theorem 3.0, then SATc(S) is polynomial- time decidable. Otherwise, SATc(S ) is log-complete in NP.

Proof. (a) Suppose that every relation in S is weakly positive. Then SAT(S) is decidable using the following algorithm:

I. Given an S-formula A, replace each conjunct of A by an equivalent CNF formula A' having at most one negated variable in each conjunct.

2. I f every conjunct of A' contains an unnegated variable, ACCEPT.

3. Otherwise, let (-I~) be a conjunct of A'. I f (~) is also a conjunct of A', REJECT. Otherwise, drop every conjunct in which -~C occurs and drop ~ from every conjunct in which ~ occurs unnegated. ( I f A' becomes empty, ACCEPT.)

4. Go to step 2.

We leave verif ication to the reader.

(b) The case where every relation in S is weakly negative is similar to (a).

(c) Suppose that every relation in S is affine. Then to decide whether a given S-formula A is satisfiable, convert A to an equivalent system of linear equations over {O,l} and solve the system by Gaussian elimination. (Eliminate one variable at a time until either all varia- bles have been eliminated or O=l has been deduced.) This is a well-known polynomial- time algorithm.

(d) Suppose that every relation in S is bijunctive. Then to decide whether a given S-formula is satisfiable, convert i t to an equivalent CNF formula with at most 2 l i tera ls per conjunct and use the Davis-Putnam procedure [DP], which as noted in [C] decides sat is f iab i l i ty of such formulas in polynomial time.

- 222 •

Page 8: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

In (a) - (d) above, we have sketched polynomial-t ime algorithms for SAT(S). For SATe(S), that is i f the formulas contain cons tan ts , - i t is obvious how to modify the algorithms.

Assume now that S does not sa t i s f y any of the condit ions (a ) - (d ) . We w i l l show that SATe(S) is NP-complete by showing SAT3 <_I~SATc(S), w~ere SAT3 is the set of s a t i s f i a b l ~ C N F formulas hav- ing at most 3 l i t e r a l s per conjunct, a known NP-complete problem [C].

Let Ro,RI,R2,R 3 be the 3-place log ica l re la - t ions defif ied-by Rn[x,y,z )_= ( x v y v z) , R l ( x , y , z ) -= ! - i x v y v z ) , R2(~,y,z)--- ( I x v - ~ y v z ) , ~3(x ,y ,z ) - t ~ x v - l y v - ~ z ) . For i = 0 , I , 2 , 3 , l e t F i ( x , y , z ) be a formula in Gen(S) which is l o g i c a l l y equiva- lent to R i ( x , y , z ) . Such formulas ex i s t by Theorem 3.0.

Given a CNF formula A with at most 3 l i t e r a l s in each conjunct, form an equivalent formula A' by replacing each conjunct of A by one of the formu- las F.. with appropriate var iables subst i tu ted. Form ~" from A' by delet ing a l l quan t i f i e r s , a f t e r making sure that a l l quant i f ied var iables are d is - t i n c t from each other and from a l l f ree var iab les . Observe that A" is s a t i s f i a b l e i f f A is s a t i s f i a - ble. I t is not hard to show that A" is log-space computable from A. This proves that SAT3 <~__ SATc(S). - ,uy

Since clearlySATc(S)cNP , i t fo l lows that the problem SATc(S ) is log-complete in NP. [ ]

We define "no-constants" analogues of Gen(S) and Rep(S), as fo l lows. Let GenNC(S) :={A~Gen(S) : no constants occur in A}, Rep~(S) { [A] : A E GenNC(S) }.

THe log ica l re la t ion R is complementive i f i t is closed under complement, that i s , i f fo r a l l (a I . . . . . am) eR, ( l - a I , . . . . l-am) ~R.

The fo l lowing easi ly-proved lemma c l a r i f i e s the re la t ion between SAT(S) and SATc(S).

Lemma 4.2. I f RePNK(S) contains [ x ] and [ - I x ] , then Rep(S)=RePNC'CS), and hence SATc(S ) and SAT(S) have the same complexity.

As the next lemma shows, the hypothesis of Lemma 4.2 f a i l s only under very res t r i c ted condi- t ions. Thus, for "most" sets S, SAT(S) and SATc(S) have the same complexity.

nonempty Lemma 4.3. Let S be a set o f^ log ica l re la t ions . Then at least one of the fo l lowing holds:

(a) Every re la t ion in S is O-val id. (b) Every re la t i on in S is l - v a l i d . (c) [ x ] and [ - i x ] are contained in RePNc(S). (d) [x~y] E RePNc(S).

Moreover, i f (c) f a i l s and (d) holds, every re la - t ion in S is complementive.

Proof. Assume as noted above that a l l re la t ions in S are nonempty. Assume that (a) and (b) f a i l . We w i l l show that (c) or (d) holds.

CASE I . Every re la t ion in S is O-val id or l - v a l i d . In th is case, since (a) and (b) f a i l , there is some RoeS that is O-val id but not l - v a l i d , and some R ~ S that is l - v a l i d but not O-val id. Let- t ing A i :=R i (x I . . . . ) , we have FA FVar(Ai) l I -

~ i ~ x ~ - { ( i ) } for i=O, l ; hence, [ x ] and [ - I x ] are in RePNc(S), so (c) holds.

CASE 2. S contains some nonempty re la t ion R that is nei ther O-val id nor l - v a l i d . In th is case, l e t A :=~ (x I . . . . ) and choose sESat(A) . Set

B := A [ s - I ( { o } ) s - l ( { l } ) ] x 0 , x 1

Since R is nei ther O-val id nor l - v a l i d , x 0 and x 1 both occur in B and [B] is e i the r { (0 ,1 ) } or { ( 0 , I ) , ( I , 0 ) } . In the former case, [(3x~)B] = [ x ] and [ (3Xl )B] = [ ~ x ] , so (c) holds. In ~he l a t t e r case, [B] = [x~y ] , so (d) holds.

Thus (c) or (d) holds in a l l cases. Now assume (c) f a i l s and (d) holds. We claim

that [x]{RePNc(S). For i f [x]eRePNc(S), we would have [ ( ~ x ) ( x ^ x~y)] = [ T y ] , contradict ing (c) f a i l i n g . S i m i l a r l y , [ I x ]~RePNc(S) . Assume for sake of contradict ion that R e S-And R is not complementive. Let A:= R(x] . . . . ) and choose s~Sat(A) such that ~ S a ~ ( A ) , where T is the com- plementary assignment of s. Define the formula B as in CASE 2. Now x 0 must occur in B, or else [ B ] = [ x ] . S im i l a r l y x I must occur in B. Thus Var(B)={Xl,X 2} and [B] contains (0 , I ) but not ( I , 0 ) . Now [B] must contain (0,0) , or else [(3xn)B] is [ x ] . And [B] must contain ( I , I ) , or els~ [ (3Xl)B] is [ T x ] . Thus, [B] is [ x ÷ y ] . But then [ ( 3 x ) ~ x ÷ y ) ^ ( x ~ y ) ~ is [ y ] , and th is contradict ion completes the proof that every re la t ion in S is complementive. [ ]

F i na l l y , we are able to prove the Dichotomy Theorem for SAT(S).

Proof of Theorem 2.1. Assume that not every re la - t ion in S is O-val id and not every re la t i on in S is l - v a l i d ( that i s , condit ions (a) and (b) of Theorem 2.1 f a i l ) . By Lemma 4.3 there are two cases to consider.

CASE I . [ x ] and [ , x ] are in RePNc(S).

In th is case, i t is easy to replace any given S- formula with constants by an equivalent S-formula without constants, so the conclusion fol lows by Lemma 4.1.

CASE 2. [x~y]eRePNc(S), and every re la t i on in S is complementive.

In th is case, l e t an S-formula with constants, A be given. Let A' be A"A(yO~y l ) , where A" is formed from A be replacing each-ocdurrence of 0 by YO and each occurrence of 1 by Y l , and YO,Yl are new- var iab les. Thus, A' is a6 S-formOla without con- stants, and, by complementarity, A' is s a t i s f i a b l e i f f A is s a t i s f i a b l e . So again the conclusion fol lows by Lemma 4.1. [ ]

5. REFINEMENT TO LOG-SPACE EQUIVALENCE

This section c l a s s i f i e s the complexity of SATe(S) up to log-space equivalence. Results are pre~ented without proof. The de f i n i t i on of (~k)- weakly pos i t i ve is found l a t e r in th is section.

C lass i f i ca t i on Theorem for SATc(S )

Theorem 5.1. Let S be a f i n i t e set of log ica l re- la t ions . Then SATe(S) l i es in one of seven log- space equivalence ~lasses, as fo l lows: (S-ATc(S) denotes the complement of SATc(S).)

LI. I f every re la t ion in S is (~O)-weakly pos i t i ve , then SATc(S) is decidable d e t e r m i n i s t i c a l l y in

- 2 2 3 -

Page 9: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

log space.

L2. I f every relation in S is (sl)-weakly positive, and S contains some non~O)-weakly positive re- lation, then either

(a) S-~-$-c(S) is log-equivalent to the graph reachability problem (given a graph G and nodes s,t , do s and t l ie in the same con- nected component of G?).

or (b) S--A-$-c(S) is log-equivalent to the digraph reachability problem (given a digraph G and nodes s,t , is there a directed path from s to t?), and hence is log-complete in nondeterministic log space.

L3. I f every relation in S is weakly positive, and S contains some relation that is not (sl)-weak- ly positive, then Rep(S) is the set of all weakly positive relations, and SATc(S) is log- complete in P.

Statements LI-L3 also hold when "negative" is sub- stituted for "positive."

L4. I f S contains some relation that is not weakly positive and some relation that is not weakly negative, and every relation is S is affine and bijunctive, then Rep(S)=Rep([x~y]), and SATc(S) is log-equivalent to the problem of deciding whether a graph is bipartite.

L5. I f S contains some relation that is not weakly positive, some relation that is not weakly neg- ative, and some relation that is not bijunctive, and every relation in S is affine, then Rep(S) = Rep([xgy®z=O]), and hence SATc(S) is log- equivalent to the problem of deciding whether an arbitrary system of linear equations over the f ie ld {O,l} is consistent.

Remark. The latter problem is log-equivalent to its own complement, since a set of linear equations is inconsistent i f f there is some subset of them which sums to O=l, a condition which i tse l f can be written as a set of linear equations. Thus for this case, any class in which SATc(S) is complete is closed under complement.

L6. I f S contains some relation that is not weakly positive, some relation that is not weakly neg- ative, and some relation that is not affine, and every relation in S is bijunctive, then Rep(S)=Rep([x ÷y]) and s--A-$-c(S ) is log-complete in nondeterministic log space.

L7. I f S contains some relation that is not weakly positive, some relation that is not weakly neg- ative, some relation that is not bijunctive, and some relation that is not affine, then Rep(S) is the set of all logical relations, and SATc(S) is log-complete in NP.

Corollary 5.2. SAT3W and NOT-EXACTLY-ONE SATISFIA- BILITY (defined in Section l) are log-complete in P. (This follows from statement L3.)

The proof of L3 relies on the result of Jones and Laaser [JL] that the problem UNIT (the set of CNF formulas that yield a contradiction from unit resolution) is log-complete in P. Although [JL] does not give any expl ic i t syntactic characteriza- tion, we believe that UNIT is just the set of un- satisfiable CNF formulas having at most one posi- tive variable in each conjunct.

Actually, the proof that SAT3W is complete in P can be given in a manner that completely paral- lels Cook's proof [C] that SAT3 (the set of satis- fiable CNF formulas with 3 l i terals per conjunct) is complete in NP. Cook's proof takes a sequence of instantaneous machine descriptions and writes a CNF formula that describes the sequence. All that is necessary to get a completeness result in P is to observe that, i f the machine involved is deter- ministic, the CNF formula constructed has at most one positive variable in each conjunct. (This re- quires certain modifications of Cook's argument, which are carried out in the UNIT proof of [JL].) The reduction to formulas with 3 l i terals per con- junct again completely parallels Cook's proof -- one has just to observe that this reduction pre- serves the property of having at most one positive variable per conjunct. (One can then reverse the negativity of all variables so as to get at most one negated variable per conjunct, i .e. SAT3W.)

The proof of statement L2(b) uses the result of Savitch [Sav] (later refined by Jones [J]) that the digraph reachability problem is complete in nondeterministic log space. Jones [J] asks whether the undirected graph reachability problem is also complete in nondeterministic log space.

The completness of the sat is f iab i l i ty problem for bijunctive formulas (cf. L6), which follows from the completeness of the digraph reachability problem, was noted in [JL].

Definitions

Let k be a nonnegative integer or ~. The re- lation R is (<k)-weakly positive i f f ~(x I . . . . ) is equivalent to-some formula B ̂ (~ i~1) ^ . . . ^(~n~n ), where Cl . . . . Cn are variables and B is a CNF formuIa in which each conjunct has at most one negated var- iable and every conjunct which has a negated varia- ble has at most k unnegated variables. (Note: The clauses (Cirri) serve no function except to make the variable ~i occur vacuously in the formula. Actually this is necessary only i f k=O.)

Note that "weakly positive" is the same as "(~)-weakly positive."

Let A be a formula, V~Var(A), k as above.

We define cl~kA ( v ) v . := {~ : 3W~V, IWIsk, such that < ~

VsESat(A), slWzO ---> s(~)=O}. Thus CI6,A(V) is

the same as~uClo,A(V). We say V is (O,~k)-closed

forA i f CI~A(V) = V. (Similar definitions are given with "negative" in- stead of "positive" and " l " instead of "O".)

The analysis of weakly positive relations in Theorem 5.1 is based on the following lemma, which generalizes Lemma 3.1W.

Lemma 5.3. Let R be a logical relation and let A:=R(x I . . . . ) and let k be a nonnegative integer or J'. Then R is (~k)-weakly positive i f f whenever V~Var(A) is O-consistent and (O,~k)-closed for A, KO, v ~ Sat(A).

Lemma 5.4. I f R is affine and bijunctive, then ~(xl . . . . ) is logically equivalent to a CNF formula

" " X composed of conjuncts of the forms (xzy) , (~y) , x and I x.

- 224-

Page 10: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

6. EXTENSION TO POLYNOMIAL SPACE

This section gives a polynomial-space analogue of the Dichotomy Theorem, involving quantified for- mulas. The result is presented without proof.

A quantified S-formula with constants is a member of the smallest set of formulas T such that (a) for each RE S, R(x I . . . . ) cT, and (b) whenever A,B~T and ~,n are variables, the following are in T: AAB, (3~)A, (V~)A, A[~], A[~], A[~]. Define

QFc(S) := {A : A is a quantified S-formula with constants, Var(A)=@, and A is true}.

Theorem 6. l . Let S be a f in i te set of logical re- lations. I f one of the four conditions (c)-(f) of Theorem 2.1 holds, QFr(S) is polynomial-time decid- able. Otherwise, QFc~S) is log-complete in polyno- mial space.

The proof relies on the result of Stockmeyer and Meyer [StM] that the problem Bun 3CNF (decide the truth of a quantified CNF formula with 3 l i t e r - als per conjunct) is log-complete in polynomial space, (See also [St]).

7. APPLICATIONS

The results presented here are potentially very useful in expediting NP-completeness proofs, for the reason that they give one a much broader "target cross-section" for use in reductions. Traditionally, a researcher has had to aim his re- duction at a specific NP-complete problem, such as the CNF sat is f iab i l i ty problem. By virtue of The- orem 2.1, the researcher's aim no longer has to be so specific. Once he has set up the framework for simulating conjunctions of clauses, he has great latitude regarding the specific content of those clauses.

To i l lustrate this idea, we prove that the TWO-COLORABLE PERFECT MATCHING problem, defined in Section l , is NP-complete. With the help of Theo- rem 2.1, the proof is rather simple, whereas pre- viously the author had tried without success to prove this problem NP-complete.

Theorem 7.1. TWO-COLORABLE PERFECT MATCHING is log- complete in NP.

Comment. With additional arguments, which we do not give here, i t can be shown that this problem restricted to planar cubic graphs in also NP-com- plete.

Proof (sketch). Consider the graph shown in Figure ~ t h r e e of whose nodes are labeled with the variables x,y,z. Any coloring of this graph with the colors "0" and " l " can be interpreted as as- signing truth values to the variables x,y,z. The requirement that the coloring be a solution to the 2-colorable perfect matching problem is thus inter- preted as imposing a certain relation on these 3 variables. I t is straightforward to verify that this relation is [ ( x v y v z ) A ( 4 x v - - y v ~ z ) ] -- the only values the t r ip le (x,y,z) cannot assume are (O,O,O) and ( l , l , l ) . Call this relation R and observe that SAT({R}) is the NOT-ALL-EQUAL SATISFI- ABILITY problem, which, as noted in Section 2, is NP-complete as a consequence of Theorem 2.].

Figure l (a)

Figure l(b)

x • A= R(x,x,y)AR(y,z,u)

Z

Figure l(c)

We wil l reduce SAT({R}) to the 2-colorable perfect matching problem.

Let an {R}-formula A be given. Construct a graph G as follows. Let Gn be the union of dis- jo int copies of Figure l(a), one copy for each con- junct of A. On each copy label the nodes "x","y" and "z" with the names of the variables occurring in the corresponding conjunct of A. Then, for each pair nl,n 2 of nodes that are labeled with the same variable, join n I to n 2 by means of the structure shown in Figure 1(b). I t may be verified that this structure forces n I and n 2 to have the same color. Call the resultlng graph G. Figure l(c) shows a simple example of this construction. I t can be seen that G has a two-colorable perfect matching i f f A is satisfiable. Hence, TWO-COLOR- ABLE PERFECT MATCHING is NP-complete. [ ]

The point we wish to make is that, in the above proof, "almost any" graph could have been used for Figure l(a). We suspect that i f one sim- ply randomly generated a graph having lO or 15 nodes, within a certain range of arc probability, the result would be, with very high probability, a graph representing a relation satisfying the condi- tions of Theorem 2.1 for NP-completeness, which would therefore serve just as Figure l(a) for a proof of NP-completeness.

This raises the.intriguing possibil i ty of computer-assisted NP-completeness proofs. Once the researcher has established the basic framework for simulating conjunctions of clauses, the rela- tional complexity could be explored with the help of a computer. The computer would be instructed to randomly generate various input configurations and test whether the defined relation was non-af- fine, non-bijunctive, etc. The fruitfulness of such an approach remains to be proved: the enumer- ation of the elements of a relation on lO or 15 variables is Surely not a l ight computational task.

- 225-

Page 11: THE COMPLEXITY OF SATISFIABILITY PROBLEMS Thomas ......also contain some representatives of the many in- termediate degrees of complexity which presumably lie between these two extremes

8. PROBLEMS FORFURTHER RESEARCH

I. Generalize the Dichotomy Theorem from 2-valued variables to k-valued variables ( i . e . , a re la- t ion of rank n is a subset of { 0 , I . . . . . k-I }n). An analogue for k=3 would imply the (already known) NP-completeness of the graph 3-colorabi l - i t y problem. I t would be interest ing to see i f any new polynomial-time decidable cases ar ise, which are not obvious general izations of the case k=2.

2. Study the complexity of deciding whether a re- la t ion is a f f ine , b i junct ive, or weakly posi t ive. From Lemmas 3.1A and 3.1B, i t can be seen that i t can be decided in polynomial time whether a given re la t ion, presented as a l i s t of i t s ele- ments, is a f f ine or b i junct ive. But we do not know of any e f f i c i en t algorithm for recognizing weakly posi t ive re lat ions.

9. CONCLUSION

We have studied the complexity of an i n f i n i t e class of s a t i s f i a b i l i t y problems and obtained c lass i f i ca t ion theorems which include and extend various previous complexity resul ts , thereby uni fy- ing these ea r l i e r results within a larger frame- work.

By way of exploring this problem, we were led to a f a i r l y r ich theory of c lass i f i ca t ion of log i - cal re lat ions, which is independent of , although motivated by, complexity-theoretic notions. I t seems l i ke l y that th is theory w i l l lead to a heightened understanding of the inherent complexity of various classes of logical re lat ions.

Since Cook's NP-completeness proof [C], the standard CNF s a t i s f i a b i l i t y problem has become a kind of canonical NP-complete problem, being proba- bly used more widely in reductions than any other NP-complete problem. We feel that the usefulness of th is problem for reductions is a property which is probably shared to some degree by other conjunc- t ive s a t i s f i a b i l i t y problems, such as those we have considered, Thus we feel that problems such as ONE-IN-THREE SATISFIABILITY and NOT-ALL-EQUAL SAT- ISFIABILITY w i l l l ikewise prove to have wide appl i - cab i l i t y in completeness proofs.

APPENDIX

Complexity-Theoretic Def ini t ions

Inputs to a l l decision problems are assumed to be presented as str ings of symbols from some f ixed f i n i t e alphabet ~.

A log-space bounded Turin 9 machine is a Tur- ing machine, having a two-way read-only input tape, a one-way wr i te-only output tape, and a single work tape, which, on any input weZ*, never v i s i t s more than c log( lw I) frames of i t s work tape, for some constant c depending on the machine. The machine is assumed determinist ic unless otherwise stated.

A function f : ~ * ÷ ~ * is log-space computable i f there is some log-space bounded Turing machine which on input weZ* outputs f(w) and halts.

Let A , B ~ * . Then A<~ B ("A is log-space reducible to B") i f f therea°gis a log-space comput- able function f such that for a l l w~Z*, w~A i f f f(w) eB. This is a t rans i t i ve re la t ion. A is

log-equivalent to B i f ASlogB and BSlogA. A set B

is log-complete in a class C i f BeC and for a l l A ~ C, A~logB.

Lo 9 space (resp. nondeterministic Io 9 space) is the class of a l l sets A~Z* such that there is some determinist ic (resp. nondeterministic) log- space bounded Turing machine that recognizes A.

Polynomial time (denoted P), nondeterministic polynomial time (denoted NP) and polynomial space (denoted PSPACE) are the classes of sets that are recognized by determinist ic and nondeterministic polynomial-time bounded and polynomial-tape bound- ed Turing machines respect ively. (These are stan- dard one-tape Turing machines, and the bounds are expressed as a function of the length of the input s t r ing . ) In this paper, "NP-complete" means " log- complete in NP."

More detai led explanations can be found in [K] ,[StM] , [St ] .

ACKNOWLEDGEMENT

I am indebted to my thesis advisor, Professor Richard M. Karp, for discussing parts of th is work with me.

REFERENCES

[BBFMP] M. Bauer, D. Brand, M. Fischer, A. Meyer and M. Paterson, A note on d is junct ive form tautologies, SlGACT News Apri l 1973, pp. 17-20.

[C] S.A. Cook, The complexity of theorem- proving procedures, Third Annual ACM Sym- posium on Theory of Computin 9, 1971, pp. 151-158.

[DP] M. Davis and H. Putnam, A computing procedure for quant i f icat ion theory, JACM 7 (1960) pp. 201-215.

[J] N.D. Jones, Space-bounded reduc ib i l i t y among combinatorial problems, J. Comp. Sys. Sci. I I (1975) pp. 68-85.

[JL] N.D. Jones and W.T. Laaser, Complete problems for determinist ic polynomial time, Sixth Ann. ACM Symp. on Theory of Comp., 1974 pp. 40-46.

[K] R.M. Karp, Reducib i l i ty among combinatorial problems, in Complexity of Computer Computa- t ions, R.W. Mi l l e r and J.W. Thatcher eds., Plenum Press, New York, 1972, pp. 85-104.

[P] E.L. Post, The two-valued i te ra t i ve systems of mathematical logic, Annals of Math. Stud- ies, vol. 5, Princeton, 1941.

[Sav] W. Savitch, Relationships between nondetermin- i s t i c and determinist ic tape complexit ies, J. Comp. Sys. Sci. 4 (1970), pp. 177-192.

[Sch] T. J. Schaefer, Complexity of decision prob- lems based on f i n i t e two-person per fec t - in fo r - mation games, 8th Ann. ACM Symp. on Theory of Comp., 1976, pp. 41-49.

[St] L.J. Stockmeyer, The polynomial-time hierar- chy, Theor. Comp. Sci. 3 (1976), pp. 1-22.

[StM] L.J. Stockmeyer and A.R. Meyer, Word problems requir ing exponential time: prel iminary re- port, 5th Annual ACM Symp. on Theory of Com- puting~ 1973, pp. I-9.

- 226 -