lower bounds for property testing
DESCRIPTION
Lower Bounds for Property Testing. Luca Trevisan U.C. Berkeley Joint work with Andrej Bogdanov and Kenji Obata. Sub-linear Time Algorithms. Want to design algorithms that run in less than linear time (and so cannot read entire input). Must be probabilistic and approximate - PowerPoint PPT PresentationTRANSCRIPT
Lower Bounds for Property Testing
Luca Trevisan
U.C. Berkeley
Joint work with Andrej Bogdanov and Kenji Obata
Sub-linear Time Algorithms
• Want to design algorithms that run in less than linear time (and so cannot read entire input).– Must be probabilistic and approximate
• For optimization problems: – Compute numerical apx of optimum cost
(and implicit representation of apx solution?)
• For decision problems:– What is approximation for decision problems?
(Graph) Property Testing
Testing a property P with accuracy in adjacency matrix representation:
• Given graph G that has property P, accept with probability >3/4
• Given graph G that is -far from property P accept with probability <1/4
-far = must change –fraction of adjacency matrix to get property P (add/remove > n2 edges)
Example [GGR,AK]
Testing bipartiteness of a given graph G• Pick (1/)polylog(1/) vertices, and check if they
induce a bipartite graph; if so accept otherwise reject
• If G is bipartite then alg accepts with prob 1• If G is -far from bipartite, then whp algorithm
discovers an odd cycle (non-trivial to prove)
• Running time: O ((1/)polylog(1/)) • We will discuss matching lower bound if time allows
Paleontologist’s approach
Bounded Degree Graphs
Testing a property P with accuracy in adjacency lists representation:
• Given graph G that has property P, accept with probability >3/4
• Given graph G that is -far from property P accept with probability <1/4 -far = must change –fraction of adjacency
lists entries to get property P (add/remove > dn edges)
Bipartiteness [GR]
Testing bipartiteness• Repeat polylog n times:
– Start at random point, and pick sqrt(n) random walks of length polylog n, if two of them combine to form an odd cycle reject, otherwise accept
• Analysis: – in a graph where you need to remove
constant fraction of edges to make it bipartite, algorithm finds odd cycle
Matching Lower Bound [GR]
• Define two distributions of graphs:– Gfar: a random hamiltonian circuit, plus a
random matching(whp 1/100-far from bipartite)
– Gbip: a random hamiltonian circuit, plus a random matching conditioned on making the graph bipartite
• Gfar and Gbip are indistinguishable to algorithms of query complexity o(sqrt(n)).
Sub-linear Time Approximation• Minimum spanning tree
– given a connected weighted graph of degree d with weights in range {1,…,w}, can approximate MST weight within (1+) in time about O(dw/2)[Chazelle, Rubinfeld, T]
• Max SAT– Given a CNF where every variable occurs at
most d times, can approximate Max SAT optimum within .618, presumably also 2/3, in O(d) time[work in progress, hopefully will get 3/4-]
Sublinear Time Approximation
• Problems restricted to dense instances:– Max CUT and other graph problems can be
approximated within (1+) in graphs with at least n2 edges in time 2poly(1/)
[GGR]– Max 3SAT can be approximated within (1+)
in instances with at least n3 clauses in time 2poly(1/) and similar results for other satisfiability problems[AFKK]
General Goals
• When looking for polynomial-time algorithms:– Several algorithmic techniques of general
applicability– A general technique to “prove” impossibility
(NP-completeness)
• For sublinear-time algorithms:– General algorithmic techniques?– Impossibility results?
Testing 3-Colorability
• Easy in adjacency matrix representation• NP-hard in adjacency list representation• Only for small enough
– Can find 3-coloring good for 80% of the edges in a 3-colorable graph using SDP
– NP-hard to find 3-coloring good for 98% (?) fraction of edges
• Non-tight, and conditional lower bound for query complexity
Other problems
• The query complexity of following problems is equivalent to query complexity of testing 3col – Testing satisfiability of 3SAT instance
• Every variable occurs in O(1) clauses, “adjacency list” representation
– Approximating max cut, vertex cover, independent set, . . ., in bounded-degree graphs
– Approximating Max SAT, Max 2SAT, . . .
• Lower bound of sqrt(n) for all problems– Nothing better except with complexity assumptions
Our Results
• For one-sided error algorithms: (n) query complexity to distinguish
3-colorable graphs from graphs that are (1/3 – )-far
– Lower bound applies to testing problems that are solvable in polynomial time
• For two-sided error algorithms:– For some , (n) query complexity to
distinguish 3-colorable graphs from graphs that are -far.
Additional Results
• Unconditionally, algorithms running in time o(n) cannot:– Approximate Max 3SAT better than 7/8– Approximate Max Cut in bounded-degree
graphs better than 16/17– . . .
• Hastad’97 proved above problems are NP-hard
The 3-Coloring Lower Bound
• Consider first one-sided error algorithms• It’s enough to find a graph G that is (1/3 – )-far
from 3-colorable, but every subgraph of size < n is 3-colorable– (for every there is an such that . . .)
• Then an algorithm of query complexity < n either accepts G (which is wrong) or rejects some 3-colorable graph (which means the algorithm has not one-sided error)
The Graph• Pick a graph of degree O(1/2) at random (pick
so many random matchings)• Then it is (1/3 – )-far whp• But, for some , whp, every subgraph induced
by k < n vertices contains <1.5k edges• In a minimal non-3-colorable graph, every vertex
has degree at least 3• Every subgraph induced by < n vertices is 3-
colorable
[Erdos]
Explicit Construction
• Can the previous construction be derandomized?
• For constants d, , , and for every suff large n, we can explicitly construct a graph on n vertices, max degree d, -far from 3-colorable, and such that every subset of n vertices induces a 3-colorable subgraph.
Explicit Construction
• We construct a 3SAT formula such that for constants k, ’, ’
– Every variable occurs k times– No assignment satisfies more than 1-’
fraction of clauses– Every ’ fraction of clauses is satisfiable– Then we use (slightly new) reduction from
3SAT to 3Coloring
The Formula
• Fix a degree-d expander graph G=(V,E) such that for every cut (S,V-S) at least min{|S|,|V-S|} edges cross the cut(enough d=14)
• Have two variables xuv and xvu for each egde (u,v)
• For every vertex v have the (3SAT equivalent of) the constraint u xuv = 1 + w xvw
Structure of the Analysis
• Impossible to satisfy more than a fraction 1/(d+1) of the constraints
• Can always satisfy half of the constraint– define an auxiliary network– show that the auxiliary network has no small
cut because of expansion– then there is a large flow– use large flow to find assignment for subset of
constraint
Flow Argument
• Want to satisfy constraints corresponding to vertices in C, with |C| < |V|/2
s
t
V-C
C
Construct flow network with new source s, sink t obtained by collapsing V-C, and vertices in C
Flow Argument
s
A
C-A
t|A| edges
|C-A| edges
•Every cut has size at least |C|
•There is a 0/1 flow of cost at least |C|
•Interpreted as an assignment, satisfies all constraints in C
Two-Sided Error Algorithms
• Need to define two distributions of graphs Gcol and Gfar such that
• Graphs in Gcol are (almost) always 3-colorable
• Graphs in Gfar are (almost) always far from 3-colorable
• To an algorithm of bounded query complexity, Gcol and Gfar look (almost) the same
Main Step• Define two distributions Dsat and Dfar of
instances of E3LIN-2(systems over GF(2) with 3 variables per equation)– Systems in Dsat are always satisfiable– Systems in Dfar are (almost) always (1/2-)-far from
satisfiable– To an algorithm of bounded query complexity, Dsat
and Dfar look the same
• We get Gcol and Gfar using reduction fromapproximate E3LIN-2 to approximate 3-coloring
E3LIN-2
X1 + X3 + X10 = 0 mod 2
X2 + X3 + X4 = 1 mod 2
X1 + X2 + X9 = 0 mod 2
. . .
Main Building Block• We show that for every c there is such
that there exists a left-hand side with– n variables, cn equations, 3 variables per
equations, every variable occurs in 3c equations
– every n equations are linearly independent
• Pick the left-hand side at random– repeat 3c times: pick at random a set of n/3
disjoint triples of variables
• Explicit construction?
Distributions
• The left-hand side is always as before
• In Dsat, we pick a random assignment to the variables, and set right-hand side consistently– always satisfiable
• In Dfar, we pick the right-hand side uniformly at random– With high probability, (1/2 – O(1/sqrt c))-far
Indistinguishability
• Two distributions differ only in right-hand side
• In Dfar uniformly distributed
• In Dsat, n-wise independent– Linear independence implies statistical
independence
• Look the same to algorithm that sees less than n equations
Conclusion of the Argument
• No algorithm of “query complexity” o(n) can distinguish satisfiable instances of E3LIN-2 from instances that are (1/2-)-far from satisfiable
• For some , no algorithm of query complexity o(n) can distinguish 3-colorable graphs from graphs that –far from 3-col.
• No algorithm of query complexity o(n) can approximate Max 3SAT better than 7/8 . . .
Open Questions
• Show that distinguishing 3-colorable graphs from (1/3-)-far graphs requires query complexity (n)– we can only prove it for one-sided error
• Show that approximating Max SAT better than ¾ and Max CUT bettter than ½ requires query complexity (n)– we only know (sqrt(n)) [implicit in GR]– would “explain” why we need SDP
Back to Dense Graphs
• Recall Alon-Krivelevich bipartiteness test for the adjacency matrix representation:– pick (1/)polylog(1/) vertices and look at
induced subgraph– if see odd cycle reject, otherwise accept
• Running time (1/2)polylog(1/) • We prove:
– (1/2) for non-adaptive algorithms– (1/1.5) for adaptive algorithms
Two Distributions
• Gfar: every edge exists with probability – whp it is /3-far from bipartite
• Gbip: pick a random partition, then every edge that crosses the partition exists with probability 2
• Thm1: look the same to non-adaptive algorithms making o(1/2) queries
• Thm2: look the same to adaptive algorithms making o(1/1.5) queries
Proof of a Weaker Statement
• Thm1 (weaker): a non-adaptive algorithm making q=o(1/2) queries in Gfar is unlikely to see an odd cycle
• Proof: – a non-adaptive algorithm asks about some subgraph
with q edges. – There are at most about qt/2 cycles of length t, and
each one exists with probability tqt/2, exponentially small in t.
– Summing over all t, it’s still unlikely that there is a cycle
Proof of a Weaker Statement
• Thm2 (weaker): an adaptive algorithm making q=o(1/1.5) queries in Gfar is unlikely to see an odd cycle
• Proof: – the algorithm sees an edge only once in 1/ queries– the algorithm sees a cycle only after querying a pair
that it already sees as connects
– It takes 1/.5 edges to have 1/ pairs of connected vertices
– It takes 1/1.5 queries to have so many edges
Some more open questions
• In adjacency matrix representation, most interesting problems solvable in constant (in ) time
• For some problems (eg testing triangle-freeness) analysis uses Szemeredy’s regularity lemma, and constant is hyper-exponential in
• Lower bound (1/)log 1/ and only and for one-sided error
• Alternative analysis / stronger lower bounds?