oblivious and non-oblivious local search for combinatorial ... · de nition 1.1 (combinatorial...
TRANSCRIPT
Oblivious and Non-Oblivious Local Search for CombinatorialOptimization
by
Justin Ward
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Computer ScienceUniversity of Toronto
c© Copyright 2012 by Justin Ward
Abstract
Oblivious and Non-Oblivious Local Search for Combinatorial Optimization
Justin Ward
Doctor of Philosophy
Graduate Department of Computer Science
University of Toronto
2012
Standard local search algorithms for combinatorial optimization problems repeatedly apply
small changes to a current solution to improve the problem’s given objective function. In
contrast, non-oblivious local search algorithms are guided by an auxiliary potential function,
which is distinct from the problem’s objective. In this thesis, we compare the standard and
non-oblivious approaches for a variety of problems, and derive new, improved non-oblivious
local search algorithms for several problems in the area of constrained linear and monotone
submodular maximization.
First, we give a new, randomized approximation algorithm for maximizing a monotone sub-
modular function subject to a matroid constraint. Our algorithm’s approximation ratio matches
both the known hardness of approximation bounds for the problem and the performance of the
recent “continuous greedy” algorithm. Unlike the continuous greedy algorithm, our algorithm
is straightforward and combinatorial. In the case that the monotone submodular function is a
coverage function, we can obtain a further simplified, deterministic algorithm with improved
running time.
Moving beyond the case of single matroid constraints, we then consider general classes of set
systems that capture problems that can be approximated well. While previous such classes have
focused primarily on greedy algorithms, we give a new class that captures problems amenable
to optimization by local search algorithms. We show that several combinatorial optimization
problems can be placed in this class, and give a non-oblivious local search algorithm that
delivers improved approximations for a variety of specific problems. In contrast, we show that
standard local search algorithms give no improvement over known approximation results for
these problems, even when allowed to search larger neighborhoods than their non-oblivious
counterparts.
ii
Finally, we expand on these results by considering standard local search algorithms for con-
straint satisfaction problems. We develop conditions under which the approximation ratio of
standard local search remains limited even for super-polynomial or exponential local neighbor-
hoods. In the special case of MaxCut, we further show that a variety of techniques including
random or greedy initialization, large neighborhoods, and best-improvement pivot rules cannot
improve the approximation performance of standard local search.
iii
Acknowledgements
I thank the members of both my supervisory and final oral exam committees—Charles Rack-
off, Toni Pitassi, Avner Magen, Alasdair Urquhart, and Derek Corneil—as well as my external
examiner Anupam Gupta. They provided many useful suggestions, insightful observations, and
supportive comments that have helped shape this thesis. Additionally, I would like to Maxim
Sviridenko for many useful discussions and advice on simplifying some the proofs presented
in the thesis, and Julian Mestre for comments on an initial draft of some results presented in
Chapters 5 and 6.
I thank various colleagues and friends at the University—especially, Siavosh Benabbas,
George Dahl, Golnaz Elahi, Michalis Famelis, Yuval Filmus, Wesley George, Abe Heifets, Dai
Tri Man Le, Joel Oren, Jocelyn Simmonds, Colin Stewart, Rory Tulk, and Dustin Wehr—for
providing not only intellectual insights and technical help but also levity, camaraderie, and
empathy. In addition, I thank Yuval Filmus for his many and considerable contributions to the
proofs presented in Chapters 3 and 4. Finally, I thank Lila Fontes for aiding in the translation
of relevant portions of Pade’s thesis, and for being a close friend and confidant throughout my
studies.
I thank the Department of Computer Science and the School of Graduate Studies for pro-
viding financial support for my studies at the University of Toronto.
I thank my supervisor Allan Borodin for his sound advice and unwavering support. My
research initially involved a significant change of direction for me and so was accompanied
by occasionally daunting periods of self-doubt and uncertainty. Most of all I thank him for
maintaining faith in my abilities and prospects, even when I had none. His support—whether
intellectual, emotional, moral, or merely financial—gave me the confidence to complete this
work, and I hope above all else that he finds the end result to be worthy of his considerable
investment.
Finally, I thank my best friend and wife Amy Miller who has been and remains a steadfast
advocate, constant inspiration, and insoluble mystery to me.
iv
Contents
1 Introduction 1
1.1 A Generic Local Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Theoretical Results for General Local Search . . . . . . . . . . . . . . . . . . . . 4
1.3 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Preliminaries 10
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Linear and Submodular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Independence Systems and Matroids . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 The Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 The Continuous Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Partial Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Maximum Coverage 25
3.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 A Non-Oblivious Local Search Algorithm . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Analysis of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Obtaining the α Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Monotone Submodular Maximization 43
4.1 A Non-Oblivious Local Search Algorithm . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Analysis of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.1 Properties of the Sequences γ . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2 Locality Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.3 Computing g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.4 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Obtaining the Coefficient Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Further Properties of g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
v
5 Set Systems for Local Search 78
5.1 Set Systems for Greedy Approximation Algorithms . . . . . . . . . . . . . . . . . 78
5.2 Weak k-Exchange Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 Strong k-Exchange Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4.1 Independent Set in (k + 1)-Claw Free Graphs . . . . . . . . . . . . . . . . 86
5.4.2 k-Matroid Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4.3 k-Uniform Hypergraph b-Matching . . . . . . . . . . . . . . . . . . . . . . 89
5.4.4 Matroid k-Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.5 Maximum Asymmetric Traveling Salesman . . . . . . . . . . . . . . . . . 92
6 Algorithms for Strong k-Exchange Systems 94
6.1 Linear Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Monotone Submodular Maximization . . . . . . . . . . . . . . . . . . . . . . . . . 101
7 Limitations of Oblivious Local Search for CSPs 111
7.1 Large Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2 Random and Greedy Initial Solutions and Best Improvement Pivot Rules . . . . 119
8 Conclusion 128
8.1 Monotone Submodular Maximization . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.2 Set Systems for Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.3 Negative Results for CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Bibliography 132
List of Algorithms 141
vi
Chapter 1
Introduction
Local search is one of the simplest algorithmic approaches to combinatorial optimization prob-
lems. Despite its simplicity, local search has been successful in both practical and theoretical
settings. It is widely used as an heuristic for solving NP-hard problems, and appears as a key
component of such classic algorithms as Edmonds’ matching algorithms, the Ford-Fulkerson
algorithm, and Dantzig’s simplex algorithm, as well as many state-of-the-art approximation
algorithms. In the non-oblivious variant of local search, the algorithm is guided by an auxiliary
potential function instead of the problem’s given objective. This technique was first formalized
by Alimonti [2, 3, 4] and Khanna, Motwani, Sudan, and Vazirani [63, 64] but has seen limited
application since its introduction. Here, we reconsider non-oblivious local search and give sev-
eral new applications of the technique. We show that standard, oblivious local search algorithms
have limited approximation performance for these applications, and thereby demonstrate the
relative power of non-oblivious algorithms.
1.1 A Generic Local Search Algorithm
We now describe more formally what we mean by “local search.” First, we describe the general
class of optimization problems considered in this thesis. We restrict our attention to combina-
torial optimization problems of the following form:1
Definition 1.1 (Combinatorial Optimization Problem). A combinatorial optimization problem
consists of:
• a goal in max,min that specifies whether it is a maximization or minimization problem.
• a ground set X.
• a collection F of subsets of X called feasible solutions.
1See Ausiello, Crescenzi, and Protasi [7] for a survey on the theory of NP-Optimization problems. Note thatour definition varies slightly from the standard in that we do not require f to assign integer values to solutions.Our notion of a “combinatorial optimization problem” is not intended to capture all problems in the field ofcombinatorial optimization, but is general enough to capture all those problems that we consider.
1
Chapter 1. Introduction 2
• a function f : 2X → R≥0 assigning a value to each subset of the ground set.
The goal of the problem is to find a set S ∈ F that either maximizes or minimizes the function
f (depending on the stated goal).
All of the problems that we consider will be maximization problems, so we shall not specify
the goal explicitly. Note that, in general, there may not be a succinct representation for either
F or f . Generally, we shall assume that F is given as a membership oracle (also called an
independence oracle, for reasons that will be made clear in Section 2.3) that answers whether
a given set is in F or not. Similarly, we generally suppose that f is given as a value oracle
that, given a subset S of X, returns its value f(S). Notable exceptions are linear functions
(described in Section 2.2) and coverage functions (described in Section 3.1). In these cases, f
has a succinct representation that we shall exploit in our algorithms.
Because the combinatorial optimization problems we consider are NP-hard, we focus on the
problem of obtaining approximate solutions. Given an instance of a combinatorial optimization
problem, we say that an algorithm is an r-approximation algorithm for that instance, for r ∈[0, 1], if the solution S produced by the algorithm is at least r times the value of the optimal
solution O. We call the value r the approximation ratio for the algorithm on the given instance,
and define the approximation ratio of an algorithm for a problem to be the infimum of the
approximation ratios of its instances. Again, here we consider only maximization problems (a
similar definition can be obtained for minimization by reversing the role of S and O in the
definition). Note that since we use approximation ratios in the range [0, 1], larger values reflect
better approximations.
The primary concern of this thesis is the application of local search to particular combi-
natorial optimization problems. Our general notion of a local search algorithm is captured in
Algorithm 1. The generic local search algorithm GenLocalSearch is parameterized by several
component functions, which together define a particular local search algorithm for a combina-
torial optimization problem.
Definition 1.2 (Generic Local Search Algorithm). Let I be some instance of a combinatorial
optimization problem with solution space S, feasible solutions, F and objective value f . The
generic local search algorithm for I has the form shown in Algorithm 1, and is specified by the
following component functions:
• A potential function g assigning each solution S ∈ S a value g(S) ∈ R≥0.
• A neighborhood structure N associating a set of nearby solutions N(S) ⊆ S with each
solution S ∈ S.
• A pivot rule pivot selecting a solution pivot(C) from the set of improved, feasible2
solutions C = T ∈ N(S) : T ∈ F and g(T ) > g(S) whenever this set is non-empty.
2While there are local search algorithms that consider infeasible solutions during the search process, all of thealgorithms considered in this thesis only consider feasible solutions.
Chapter 1. Introduction 3
• An initial solution Sinit ∈ F .
Algorithm 1: GenLocalSearch
Algorithm: LocalSearch
S ← Sinit;repeatC ← ∅;foreach T ∈ N(S) do
if T ∈ F and g(T ) > g(S) thenC ← C ∪ T;
if C 6= ∅ thenS ← pivot(C);
until S does not change;return S;
Note that each of the functions in Definition 1.2 depend implicitly on the instance I. Intu-
itively, the local search algorithm proceeds by first finding an initial feasible solution Sinit, then
repeatedly searching in the neighborhood N(S) of the current solution S for a set of feasible
candidate solutions, each of which improves the potential function g. After a set C of candidate
solutions is found, the pivot rule pivot selects a new current solution from it. When no im-
proved solutions are found in the neighborhood of the current solution, the algorithm returns
S.
In most of the algorithms we present, the pivot rule simply returns the first improved feasible
solution encountered when search N(S). In this case, it is unnecessary to build the entire set of
candidate solutions C, and so we omit this step from the algorithm. One notable exception is
in Chapter 7 where we examine the effect of the pivot rule on the approximation performance
of Algorithm 1.
In general, there are no global guarantees on the quality of the solution S returned by
GenLocalSearch even in terms of the potential function g. However, we can say that any S
returned by GenLocalSearch is a local optimum of g in the following sense:
Definition 1.3 (Local Optimum). Let N be a neighborhood structure and g be a potential
function. Then, a solution S ∈ F is a local optimum with respect to g and N if we have
g(T ) ≤ g(S)
for all T ∈ N(S).
Note that the notion of local optimality depends on both the neighborhood N and the
potential function g used in GenLocalSearch. Thus, whenever we refer to “local optima” we mean
local optima with respect to some understood, previously fixed neighborhood and potential
function.
Chapter 1. Introduction 4
1.2 Theoretical Results for General Local Search
We now present an overview of the major theoretical approaches to local search as a general
algorithmic paradigm. The first such approach concerns the time required for Algorithm 1 to
converge to a local optimum. Because its runtime depends on the number of improvements
that it applies, Algorithm 1 could require exponential time to find a local optimum even if
each improvement can be calculated efficiently. Motivated by this general question, Johnson,
Papadimitriou, and Yannakakis [60] define the class PLS, of polynomial local search problems. A
search problem is in PLS if the initial solution and each iteration of the local search algorithm can
be carried out in polynomial time. Hence, the primary complexity-theoretic questions regarding
the class PLS pertain to the number of improvements that the local search algorithm can make.
All of the problems in this class are search problems, in which the goal is to find a solution to
an NP-optimization problem that is locally optimal with respect to some given neighborhood
and objective function. Thus, when we speak about the PLS completeness of some problem, we
are referring specifically to the problem of finding any locally optimal solution with respect to
some stated neighborhood.
Johnson et al. provide an appropriate reduction for the class PLS, which they use to prove
the completeness of a variety of problems. They prove that if any PLS problem is NP-hard
then NP =coNP. In contrast, they show that the “standard algorithm problem,” in which
we must find the specific local optimum returned by GenLocalSearch for some particular initial
starting solution Sinit, is NP-hard for all PLS-complete problems. Using “tight” PLS-reductions,
Papadimitriou, Yannakakis, and Shaffer [79] give a general method for showing that a variety
of local search problems do in fact have exponential worst-case behavior. Moreover, they show
that the standard algorithm problem is in fact PSPACE-complete. Thus, the problem of finding
some local optimum appears to be easier than that of finding the particular local optimum
produced by a local search algorithm. In this and subsequent work [83], they demonstrate the
PLS-completeness of several well-known local search problems, including Lin and Kernighan’s
heuristics for the traveling salesman and graph bipartition problems, the problem of finding a
stable configuration in an undirected neural network, and various local search algorithms for
MaxCut.
In practice, we can eschew the difficulties posed by PLS-completeness by weakening our
notion of local optimality. In many situations, it is sufficient to find a solution that is only
approximately locally optimal, in the following sense.
Definition 1.4 (ε-Approximate Local Optimum). Let N be a neighborhood structure and g be
a potential function. Then, a solution S ∈ F is an ε-approximate local optimum with respect
to g and N if we have
g(T ) ≤ (1 + ε)g(S)
for all T ∈ N(S).
This idea has been used in a variety of contexts to yield polynomial-time local search
Chapter 1. Introduction 5
algorithms. A variant for linear objective functions is described by Arkin and Hassin [5]. They
round all the weights used to define the objective function down to integer multiples of some
well-chosen value, thus requiring that each step of the local search algorithm must make a
constant additive improvement to the problem’s objective function. Orlin, Punnen, and Schulz
[77] consider the general difficulty of finding approximate local optima for linear combinatorial
optimization problems, and show that this can be accomplished in polynomial time for any
problem in PLS.
While the general theory of PLS-completeness gives non-trivial bounds on the convergence
time of the standard local search algorithm, it says nothing about the relative quality of the
local optima produced by local search. In order to study such questions, Ausiello and Protasi [8]
introduce the class GLO of those NP-Optimization problems that have guaranteed local optima
with respect to a given neighborhood mapping. A problem has guaranteed local optima with
respect to N if there is some constant k ∈ R≥0 such that any solution S that is locally optimal
with respect to N has objective value at least 1/k times that of a globally optimal solution.
The constant k for a problem in GLO gives a natural bound on the approximation performance
of a local search algorithm. We define the locality ratio of a local search algorithm on a given
instance of a combinatorial optimization problem to be the largest value r ∈ [0, 1] such that for
any local optimum S we have f(S) ≥ r ·f(O), where O is a global optimum. Then, the locality
ratio r corresponds to the value 1/k in the definition of GLO. By analogy with approximation
ratios, we define the locality ratio for a problem to be the infimum of the locality ratios of its
instances.
There are several advantages to working with locality ratios. An algorithm’s locality ratio
is determined solely by the potential function g and the neighborhood structure N . However,
it does not depend on the initial solution Sinit or the pivot rule pivot. The locality ratio hence
allows us to compute a lower bound on the approximation ratio of GenLocalSearch without con-
sidering the dynamic behavior of the algorithm, which can be extremely difficult to determine.
For this reason, virtually all analyses of particular local search algorithms are based on the
algorithms’ locality ratios. One noteworthy exception is Chandra and Halldorsson’s analysis
[22] of a greedy local search algorithm for maximum weight independent sets in (k+1)-claw free
graphs. They show that a local search algorithm in which Sinit is chosen greedily and pivot
always chooses the best improved solution attains an approximation ratio of almost 3/2 times
the locality ratio for the problem.
Thus far, we have been using the terms “potential function” and “objective function,”
interchangeably in our discussion of local optimality; that is, we have been assuming that the
potential function g used by GenLocalSearch is simply the problem’s given objective function f .
In independent work, Alimonti [2, 3, 4] and Khanna et al. [63, 64] introduce the notion3 of non-
oblivious local search, in which the potential function used to guide the local search procedure
is different from the problem’s stated objective function, f (in contrast, they call variants of
3The (unfortunate) terminology “non-oblivious local search” is due to Khanna et al.
Chapter 1. Introduction 6
local search in which g = f oblivious local search algorithms). They show that non-oblivious
techniques yield improved locality ratios for a variety of problems. Note that we always require
local optimality with respect to g but state locality ratios in terms of f . That is, a problem
has locality ratio r for some non-oblivious local search algorithm of the form GenLocalSearch if
every local optimum S with respect to g satisfies f(S) ≥ r · f(O).
By analogy with GLO, Khanna et al. formulate the class NonObliviousGLO of problems that
have non-zero locality ratios for some non-oblivious potential function. They show that GLO
is a strict subset of NonObliviousGLO; that is, there are problems that have locality ratio 0 for
oblivious local search but some positive locality ratio for non-oblivious local search. Khanna
et al. further prove that every problem in MaxSNP can be approximated to within some con-
stant factor by some non-oblivious local search in which the neighborhood relation N satisfies
d(S, T ) = 1 for all S and all T ∈ N(S), where d is the Hamming distance between solutions S
and T .
Despite the apparent relative power of non-oblivious local search, there has been little
application or systematic study of it since these first results. Berman [11] gives a non-oblivious
local search algorithm for the weighted independent set problem in (k+1)-claw free graphs (we
discuss this algorithm further in Chapters 5 and 6). Berman and Krysta [12] further consider
a generalization of this algorithm in which the weights are raised to some power between 1
and 2. Finally, some of the local search approaches to facility location problems [6, 23] make
use of weight scaling to improve the approximation performance of the algorithm. Although it
is not presented as such, the resulting algorithm essentially employs a non-oblivious potential
function.
In this thesis, we revisit non-oblivious local search. We apply the technique in several new
areas, including submodular maximization, and obtain improved approximation for a variety
of problems. Even in cases where non-oblivious techniques merely match the performance of
existing approximation algorithms, they yield combinatorial algorithms that are significantly
simpler than existing approaches. In contrast, we show that variants of oblivious local search
for these problems give diminishing returns even when they are allowed to consider much larger
neighborhoods than their non-oblivious counterparts.
1.3 Our Contributions
We now outline the main contributions presented in the thesis.
• In Chapter 3, we give a new, combinatorial algorithm for the problem of maximizing a
coverage function subject to a matroid constraint. Coverage functions, defined in Section
3.1, are a particular class of submodular functions possessing a succinct, explicit represen-
tation. Our non-oblivious algorithm makes use of a special, weighted potential function,
whose weights were derived by solving a family of linear programs. In addition to stating
the general formula for this function and proving that it yields an improved locality ratio,
Chapter 1. Introduction 7
we give some details of the experimental approach used to derive it.
Our non-oblivious algorithm is a (1 − 1e )-approximation, which is optimal under the as-
sumption P 6= NP as well as in the value oracle setting. Our algorithm matches the approx-
imation performance of the continuous greedy algorithm described in Section 2.5, which
applies in the more general setting of maximizing any monotone submodular function
subject to a matroid constraint. However, our algorithm is simpler and more straightfor-
ward than the continuous greedy algorithm, and is completely combinatorial. In contrast
to our non-oblivious local search algorithm, we show that oblivious local search has a
locality ratio of only 12 + ε, even when allowed to search much larger neighborhoods than
our algorithm.
This chapter is based on joint work with Yuval Filmus, appearing in [40].
• In Chapter 4, we turn to the general problem of maximizing any monotone submodu-
lar function subject to a matroid constraint. We expand the non-oblivious approach of
Chapter 3 to this setting, matching the general applicability of the continuous greedy
algorithm, as well as its approximation performance. Again, our algorithm is simple
to state and combinatorial. The results of Chapter 3 make crucial use of the succinct
representation available for coverage functions, and here we do not have access to such
a representation. Thus, the techniques required for the general submodular case are a
non-trivial extension of the maximum coverage case. Again, we provide a complete con-
struction and analysis for our non-oblivious potential function as well as the details of its
derivation. Additionally, we show that our construction produces the same non-oblivious
potential function as that in Chapter 3 when applied to a submodular function that is
a coverage function. Unlike the algorithm of Chapter 3, however, our general algorithm
requires randomization. Specifically, it employs a sampling procedure to compute our
potential function efficiently.
Our algorithm is a (1− 1e )-approximation for monotone submodular maximization subject
to a matroid constraint. Moreover, if the total curvature of the submodular function is at
most c, our algorithm is a(
1−e−cc
)-approximation. Even in this specific case, our result
matches performance of the continuous greedy algorithm, and is the best possible in the
value oracle model.
This chapter is based on joint work with Yuval Filmus, appearing in [41, 42].
• In Chapters 5 and 6, we consider the problem of linear and monotone submodular maxi-
mization in larger classes of set systems. There is a wealth of research deriving set systems
that capture problems for which the greedy algorithm attains some constant approxima-
tion ratio, but there are no such results for local search algorithms.
In Chapter 5, we introduce a new class of set systems called k-exchange systems, and
show that they capture combinatorial optimization problems for which oblivious local
Chapter 1. Introduction 8
search attains a 1k -approximation. We prove several results relating k-exchange systems to
existing classes of set systems. Finally, we show that a variety of well-known combinatorial
optimization problems give rise to k-exchange systems.
In Chapter 6, we consider non-oblivious local search for k-exchange systems. We extend
a simple algorithm based on an existing approach [11] for the weighted independent set
problem in (k+ 1)-claw free graphs to all k-exchange systems. Moreover, we show how to
generalize this approach to the case of monotone submodular objective functions. Because
the approach for the weighted case makes crucial use of the objective function’s weighted
representation, our generalization is non-trivial. We obtain 2k+1 - and 2
k+3 -approximations
for maximizing linear and monotone submodular objective functions, respectively, in k-
exchange systems. This provides improved approximations in both the general case and
for several specific problems.
These chapters are based on work appearing in [97] and joint work with Feldman, Naor,
Schwartz in [39]. This latter paper was merged from separate submissions by myself
and the listed authors. Unless otherwise noted, I present only my own independent
contributions here.
• In Chapter 7, we prove a variety of negative results for oblivious local search in the
general setting of Boolean constraint satisfaction problems. Specifically, we consider the
performance of oblivious local search both when the neighborhood size is increased, and
when the initial solution is chosen randomly or via a simple, greedy algorithm.
The first set of results consider the h(n)-local search algorithm that at each step changes
the assignment to at most h(n) variables for some function h depending on the total
number n of variables. We show that if a constraint satisfaction problem possesses an
instance with a particular kind of local optimum under 1-local search, then this instance’s
locality ratio is an upper bound on the locality ratio of h(n)-local search for all h = o(n).
Moreover, even if h(n) = cn for some small value c, the locality ratio for the problem
remains strictly less than 1. Note that in this case we are allowing the local search
algorithm to examine an exponential number of solutions in each iteration.
The second set of results considers the particular CSP MaxCut. The bounds in this prob-
lem differ from our other results in that we consider the effects of the initial solution Sinit
and the pivot rule pivot used to define the oblivious local search algorithm. Thus, we
consider the actual dynamic behavior of the algorithm and directly bound its approxi-
mation ratio, rather than simply considering its locality ratio. We show that there are
instances for which a local search algorithm that chooses its initial solution Sinit uniformly
at random has expected approximation ratio at most 3/4. This bound is less than the
0.878-approximation produced by Goemans and Williamson [45] and holds even in the
case that the local search algorithm has access to an arbitrarily powerful oracle for the
function pivot that chooses an improved solution at each step. If pivot is implemented
Chapter 1. Introduction 9
by a greedy rule that always chooses the best available improvement, we can improve our
bound to 1/2, showing that the randomly initialized best-improvement local search has
an expected approximation ratio no better than the locality ratio for deterministic 1-local
search. All of our results hold generally for any h(n)-local search in which h = o(n).
Moreover, we derive non-trivial bounds even in the case that h(n) = cn for c < 1/2. Fi-
nally, we show that choosing Sinit by using the greedy algorithm can result in a worst-case
local optimum and so cannot attain an approximation ratio beyond the locality ratio for
the problem.
Chapter 2
Preliminaries
In this chapter, we review some definitions regarding linear and submodular functions, inde-
pendence systems, matroids, and related algorithms. We present relevant work in the area, and
give some general theorems that will prove useful in later sections. We begin by establishing
some standard notational conventions in Section 2.1. In Section 2.2 we consider two particular
classes of objective functions f and identify some useful extra properties of such functions. In
Section 2.3, we examine restrictions on the structure of the collection F of feasible solutions.
Finally, in Sections 2.4, 2.5, and 2.6, we review known algorithms for solving the resulting
classes of combinatorial optimization problems.
2.1 Notation
We begin by describing the notational conventions used in the thesis.
2.1.1 Sets
Throughout, we shall (with a few exceptions) use lowercase letters to denote single values
or elements, uppercase letters to denote sets of elements, and calligraphic letters to denote
collections of sets. We use the following special notations related to sets:
• R≥0 denotes the set of non-negative real numbers.
• N denotes the set of natural numbers.
• For an integer n, [n] denotes the set 1, . . . , n.
• For a set S and element x, we use the shorthand S + x for the set S ∪ x and the
shorthand S − x for the set S \ x.
• For a set S, 2S denotes the set of all subsets of S.
• For a set S and an integer k, we denote by(Sk
)the collection of all subsets of S containing
exactly k elements.
10
Chapter 2. Preliminaries 11
2.1.2 Probability
• For a condition (or event) C, 1(C) denotes the indicator that is 1 when C is true and 0
otherwise.
• For a random event E and some explicitly given probability distribution, Pr [E] denotes
the probability that E will occur.
• For a variable x, a function f , and a set of values S, Ex∈S [f(x)] denotes the expected
value of f(x) when the value x is chosen uniformly at random from S.
• When the probability distribution of a random variable X has been explicitly stated, and
there is no chance of confusion, we shall write E[X] the expected value of X with respect
to this distribution.
2.1.3 Miscellaneous
• We use Hk to denote the kth harmonic number, given by
Hk =k∑i=1
1
k.
Note that the sequence H1, H2, . . . is increasing. We shall also make use of the well-known
fact that Hk = Θ(log k).
• We use the notation f = O(g), to indicate that f has the same asymptotic rate of growth
as g when poly-logarithmic factors are ignored.
2.2 Linear and Submodular Functions
Perhaps the simplest useful class of objective functions f are linear functions.
Definition 2.1 (Linear Function). A function f : 2X → R≥0 is linear if
f(A) + f(B) = f(A ∪B) + f(A ∩B)
for all A,B ⊆ X.
A linear function f can always be represented in terms of a weight function w : X → R≥0
that assigns each element of x ∈ X a non-negative weight w(x). The value f(S) is then given
by the total weight∑
x∈S w(x) of all elements in S.
If we relax the equality in Definition 2.1 we obtain the class of submodular functions.
Definition 2.2 (Submodular Function). A function f : 2X → R≥0 is submodular if
f(A) + f(B) ≥ f(A ∪B) + f(A ∩B)
Chapter 2. Preliminaries 12
for all A,B ⊆ X.
For a submodular function f and a set A, we define the marginal gain with respect to S of
an element x ∈ X \A as
fS(x) = f(S + x)− f(S) .
The notion of an element’s marginal gain is roughly analogous to the notion of an element’s
weight in the linear case. However, in the submodular case, an element can have different
marginal gains with respect to different sets.
Nemhauser, Wolsey, and Fisher study various aspects of combinatorial optimization prob-
lems in a pair of papers [76, 43]. Among other things, they show that the following properties
are equivalent characterizations of submodularity:
Definition 2.3 (Submodular Function (Alternative Characterizations)). Consider a function
f : 2X → R≥0. Then, the following statements are each equivalent to the statement that f is
submodular:
(i) fA(x) ≤ fB(x) whenever B ⊆ A ⊆ X and x 6∈ A.
(ii) f(A+ x) + f(A+ y) ≥ f(A ∪ x, y) + f(A) for all A ⊆ X and x, y 6∈ A.
These characterizations essentially state that submodular functions are characterized by
decreasing marginal gains. Thus, submodularity can be viewed as a discrete analogue of con-
cavity. Furthermore, the concept of decreasing marginal gains lends itself naturally to many
economic and combinatorial settings, as shown by Nemhauser, Wolsey, and Fisher.
We shall assume that all submodular objective functions f are normalized so that f(∅) = 0.
Note that this condition holds trivially for linear functions. We restrict ourselves further to the
class of monotone submodular functions.
Definition 2.4 (Monotone Function). A function f : 2X → R≥0 is monotone if f(B) ≤ f(A)
for all B ⊆ A ⊆ X.
Note that in a monotone submodular function, all marginal gains are non-negative. In fact,
this property provides an alternative characterization of the class of monotone functions.
Another natural restriction involves how much the marginals of a submodular function are
allowed to decrease. This notion is captured by the curvature of a submodular function.
Definition 2.5 (Total Curvature). A monotone submodular function f has total curvature c
if and only if
f(A ∪B) ≥ f(A) + (1− c)f(B).
for any two disjoint sets A,B.
For c = 1 the definition is equivalent to a statement of monotonicity. In the case that
c = 0, the statement implies that f(A) + f(B) ≤ f(A ∪ B) and from submodularity we have
Chapter 2. Preliminaries 13
f(A) + f(B) ≥ f(A∪B) since A and B are disjoint, so in fact f(A∪B) = f(A) + f(B) for all
disjoint A and B. That is, the case c = 0 corresponds to the case in which f is linear. Thus,
the parameter c ∈ [0, 1] smoothly interpolates between the class of all monotone submodular
functions and linear functions.
Finally, we give three useful theorems regarding monotone submodular functions. The first
two are very slight modifications of Lemmas 1.1 and 1.2 in Lee, Sviridenko, and Vondrak [71].
Theorem 2.6. Let f be a monotone submodular function on X. Let C, S ⊆ X, and Tili=1
be a collection of subsets of C \ S such that each element of C \ S appears in at least k of the
subsets Ti. Then,l∑
i=1
[f(S ∪ Ti)− f(S)] ≥ k [f(S ∪ C)− f(S)]
Proof. Fix an arbitrary ordering ≺ on X. For any x ∈ C \ S let Cx be the set of elements in
C that precede x in the ordering. Similarly, let T xi be the set of elements of Ti that precede x.
Then, we have T xi ⊆ Cx for all x. Thus,∑x∈Ti
fS∪Cx(x) ≤∑x∈Ti
fS∪Txi (x) =∑x∈Ti
f(S ∪ T xi + x)− f(S ∪ T xi ) = f(S ∪ Ti)− f(S) ,
where the first inequality follows from submodularity and the last equality from telescoping the
summation. Now, we have:
l∑i=1
f(S ∪ Ti)− f(S) ≥l∑
i=1
∑x∈Ti
fS∪Cx(x) ≥∑x∈Ti
k · fS∪Cx(x) = k [f(C ∪ S)− f(S)] ,
where the second inequality follows from the fact that each x ∈ S occurs in at least k of the
sets Ti and fS∪Cx(x) ≥ 0 since f is monotone.
Theorem 2.7. Let f be a monotone submodular function on X. Let C, S ⊆ X, and Tili=1
be a collection of subsets of S \ C such that each element of S \ C appears in at most k of the
subsets. Then,l∑
i=1
[f(S)− f(S \ Ti)] ≤ k [f(S)− f(S ∩ C)]
Proof. Fix an arbitrary ordering ≺ on S \C. For any x ∈ S \C let Sx be the set containing x,
and all the elements from S \C that precede x in the ordering. Similarly, let T xi contain x and
all the elements from Ti preceding x. Then, we have T xi ⊆ Sx for all x and so S \ Sx ⊆ S \ T xifor all x. Thus,∑
x∈Ti
fS\Sx(x) ≥∑x∈Ti
fS\Txi (x) =∑x∈Ti
[f((S \ T xi ) + x)− f(S \ T xi )] = f(S)− f(S \ Ti) ,
where the first inequality follows from submodularity and the last equality from telescoping the
Chapter 2. Preliminaries 14
summation. Now, we have:
l∑i=1
[f(S)− f(S \ Ti)] ≤l∑
i=1
∑x∈Ti
fS\Sx(x) ≤∑
x∈S\C
k · fS\Sx(x) = k [f(S)− f(S ∩ C)] ,
where the second inequality follows from the fact that each x ∈ S occurs in at most k of the
sets Ti and fS\Sx(x) ≥ 0 since f is monotone.
We primarily use Theorem 2.6 in the restricted setting in which k = 1 and Tili=1 is a partition
of C \ S. The next theorem is another application of Theorem 2.6, involving the average value
of a submodular function on subsets of a particular size.
Theorem 2.8. Let f be a non-negative submodular function, and let S be a set of size m. For
k in the range 1 ≤ k ≤ m,1(mk
) ∑T∈(Sk)
f(T ) ≥ k
mf(S).
Proof. Each element x ∈ S appears in exactly(m−1k−1
)= k
m
(mk
)of the sets in
(Sk
). From Theorem
2.6, and the assumption that f(∅) = 0, we then have:
∑T∈(Sk)
f(T ) =∑T∈(Sk)
[f(T )− f(∅)] ≥ k
m
(m
k
)[f(S)− f(∅)] =
k
m
(m
k
)f(S) .
2.3 Independence Systems and Matroids
In this section, we consider various classes of feasible solutions for combinatorial optimization
problems. Our classes are all built on the notion of an independence system. An independence
system is given by a ground set X and a non-empty, downward closed collection I of subsets
of X:
Definition 2.9 (Independence System). Let X be a set of elements and I ⊆ 2X . Then the set
system (X, I) is an independence system if and only if I 6= ∅ and for all A,B ⊆ X, A ∈ I and
B ⊆ A implies B ∈ I.
We refer to the sets in I as independent sets. For a given set A ⊆ X we call the inclusion-
wise maximal independent subsets of A bases of A, or, when A is understood to be X, simply
bases. Finally, when dealing with independence systems we assume that every element x ∈ X is
contained in at least one independent set A ∈ I. This assumption is without loss of generality
since if some element does not occur in any independent set of I, we can remove it from
the ground set X without affecting the set I of feasible solutions. Furthermore, because I is
downward closed, this assumption is equivalent to the assumption that x ∈ I for all x ∈ X.
While the class of all independence systems is too general to give rise to interesting algorith-
mic and combinatorial properties, it does serve as the basis for several more restricted classes of
Chapter 2. Preliminaries 15
set systems which do exhibit interesting properties. Probably the most well known such class,
is the class of matroids.
Matroids were first axiomatized by Whitney [98] as a generalization of the notion of linear
independence in vector spaces.
Definition 2.10 (Matroid [98] (also [85, (39.1)])). An independence system (X, I) is a matroid
if and only if for all A,B ∈ I if |A| > |B| then there exists x ∈ A \B such that B + x ∈ I.
The following are some simple classes of matroids that we will refer to later in the thesis. In
a uniform matroid of rank k, I consists of precisely those sets of size at most k. In a partition
matroid we are given a partition of X into p sets X1, . . . , Xp, and integers k1, . . . , kp. Then, Icontains precisely those sets S for which |S ∩Xi| ≤ ki for all 1 ≤ i ≤ p. In a graphic matroid,
we are given an undirected graph G = (V,E). The ground set is E and I contains precisely
those sets of edges that do not contain a cycle.
There are various alternate characterizations for the class of matroids. Two that shall be
useful are the following, which are given in terms of bases.
Theorem 2.11 ([85, (39.2)]). An independence system (X, I) is a matroid if and only if for
all E ⊆ X, all bases of E have the same size.
The common size of all bases of a set E is called the rank of E, denoted rank(E). The rank
of the matroid (X, I) is then simply the rank of X.
Whitney [98] also gave the following alternate characterization of matroids in terms of bases.
Theorem 2.12 ([98] (also [85, Theorem 39.6])). Let B be a non-empty collection of subsets of
X. Then, B is the collection of bases of a matroid if and only if:
(i) For any A,B ∈ B and x ∈ A \B, there exists y ∈ B \A such that A− x+ y ∈ B.
(ii) For any A,B ∈ B and x ∈ A \B, there exists y ∈ B \A such that B − y + x ∈ B.
Brualdi [16] shows that matroids exhibit the following stronger exchange properties.
Theorem 2.13 ([16, Theorem 1] (also [85, Corollary 39.12a])). Let A,B be bases of a matroid
M. Then, there exists a bijection π : A → B such that B − π(x) + x is a base of M for all
x ∈ A. Furthermore, π(x) = x for all x ∈ A ∩B.
Theorem 2.14 ([16, Theorem 2] (also [85, Theorem 39.12])). Let A,B be bases of a matroid
M. Then, for any x ∈ A there exists some y ∈ B such that A− x+ y and B − y + x are both
bases of M.
Theorems 2.11 and 2.13 (i.e. the fact that all bases of a matroid have equal size and the
existence of the bijection π) are typically all that we need from the structure of a matroid in
order to derive our results.
Ideally, we would like for the bijection π from 2.13 to satisfy the stronger conditions of 2.14
(i.e. to have a single bijection π : A → B such that both B − π(x) + x and A − x + π(x)
Chapter 2. Preliminaries 16
are bases). Brualdi [16] gives an example of a matroid for which this cannot be done. Later
work by Brualdi and Scrimger [19, 17, 18] generalizes the base exchange characterization of
Theorem 2.12 to consider weakly base orderable matroids, in which a bijection π satisfying this
stronger property does exist. They also define the following class of matroids, that exhibit an
even stronger sort of exchange property.
Definition 2.15 (Strongly Base Orderable Matroid). A matroid is strongly base orderable if
for any pair A,B of its bases there exists a bijection π : A → B such that for all C ⊆ A,
(B \ π(C)) ∪ C is a base (where π(C) = π(x) : x ∈ C.
That is, in a strongly base orderable matroid, the bijection π : A→ B between any two bases
A to B can be extended to subsets of A and B. Essentially, this means that in any strongly base
orderable matroid, several pairs of swaps (each of the x, π(x)) can be performed simultaneously.
The class of strongly base orderable matroids is quite large, containing gammoids, transversal
matroids, and partition matroids.1 An example of a matroid that is not strongly base orderable
is the graphic matroid on K4.
A final useful characterization of matroids is the following, which relates them to submodular
functions:
Theorem 2.16 ([98] (also [85, Theorem 39.8]). Let rank : 2X → Z≥0. Then rank is the rank
function of a matroid if and only if:
(i) rank(T ) ≤ rank(U) ≤ |U |, for all T ⊆ U ⊆ X.
(ii) rank(T ) + rank(U) ≥ rank(T ∪ U) + rank(T ∩ U), for all T,U ⊆ X.
That is, rank() is the rank function of a matroid if and only if rank() is monotone sub-
modular. A rank function on X implicitly specifies the independence system (X, I), with
I = S ⊆ X : |S| ≤ rank(S).
2.4 The Greedy Algorithm
We now examine in more detail combinatorial optimization problems whose feasible sets are
given by independence systems. In such problems, we are given an independence system (X, I)
and a function f : 2X → R≥0. The goal is to find a set S ∈ I that maximizes the value f(S).
First, let us consider the case in which f is a linear function, given as a weight function
w : X → R≥0. Then, the related combinatorial optimization problem is equivalent to the
problem of finding an independent set in I of maximum total weight. Rado [81] showed that
the standard greedy algorithm, shown in Algorithm 2, is optimal for all linear functions f
whenever (X, I) is a matroid. Conversely, Edmonds [33] showed that if the standard greedy
algorithm provides an optimal solution in I for every linear function f on 2X , then the system
1Each of these classes is contained within the next. We do not provide definitions for gammoids and transversalmatroids here.
Chapter 2. Preliminaries 17
Algorithm 2: Greedy
Input: Independence system (X, I)Weight function w : X → R≥0
S ← ∅;T ← X;while T 6= ∅ do
x← arg maxt∈T w(t);T ← T \ x;if S ∪ x ∈ I then
S ← S ∪ x;
return S;
(X, I) must be a matroid. In this sense, matroids are exactly those independence systems for
which the standard greedy algorithm is optimal with respect to all linear functions.
Now, let us examine the more general case in which f is any monotone submodular function.
The earliest reference to the problem of maximizing a submodular set function subject to
a matroid constraint seems to be Cornuejols, Fisher, and Nemhauser [30]. They consider
a constrained maximization variant of a facility location problem that is a special case of
monotone submodular maximization subject to a uniform matroid constraint. They show that
a greedy algorithm is a 1− 1/e-approximation algorithm for this problem, while a simple local
search algorithm is only a 1/2-approximation. Fisher, Nemhauser, and Wolsey [43] consider the
general case of maximizing an arbitrary monotone submodular function subject to an arbitrary
matroid constraint. The standard greedy algorithm that they consider for the problem is
shown in Algorithm 3. Algorithm 3 is obtained naturally by modifying Algorithm 2 to use
Algorithm 3: SubmodularGreedy
Input: Independence system (X, I)Submodular function f : 2X → R≥0
S ← ∅;T ← X;while T 6= ∅ do
x← arg maxt∈T fS(t);T ← T \ x;if S ∪ x ∈ I then
S ← S ∪ x;
return S;
the marginal gains fS with respect to the current solution in place of the weight function w.
Fisher et al. show that SubmodularGreedy is a 1/2-approximation and that this bound is tight.
They also show that a simple 1-local search algorithm is a 1/2-approximation (again, they give
an example showing that this bound is tight). Many results pertaining to the greedy algorithm
Chapter 2. Preliminaries 18
for submodular maximization are summarized in a survey of Goundan and Schulz [46].
From a hardness perspective, Feige [35] showed that unless P = NP, it is impossible to
approximate the following maximum k-coverage problem beyond a factor of 1− 1/e. In maxi-
mum k-coverage, we are given a family of subsets of some universe U and must select k subsets
that cover as much of U as possible. The coverage function is monotone submodular and the
constraint that we can take at most k sets can be formulated as a uniform matroid of rank k.
Thus, maximum k-coverage is a special case of maximizing a monotone submodular function
subject to a matroid constraint.
Nemhauser and Wolsey [75] considered the problem of monotone submodular maximization
subject to a matroid constraint in the value oracle model. In this model we are given f via
an oracle that provides its value on any given set. Nemhauser and Wolsey show that attaining
any approximation better than 1− 1/e requires an exponential number of value queries to the
oracle for f .
2.5 The Continuous Greedy Algorithm
Calinescu, Chekuri, Pal and Vondrak [20, 93, 21] improved on the long-standing 1/2-approximation,
giving an algorithm that attains the optimal approximation ratio of 1−1/e for the general prob-
lem of maximizing any monotone submodular function subject to a single matroid constraint.
Their algorithm, called the continuous greedy algorithm, consists of two phases. In the first
phase, they solve a particular relaxation of the combinatorial optimization problem to obtain
an approximate fractional solution. In the second phase, this fractional solution is rounded to
an integral solution of the original problem by using the pipage rounding framework of Ageev
and Sviridenko [1].
We now consider the continuous greedy algorithm in more detail. Let f be a monotone
submodular function on X and let M = (X, I) be a matroid on X. The continuous greedy
algorithm considers the following continuous, multilinear extension of f , where ~x ∈ [0, 1]X , is a
vector with a component xi ∈ [0, 1] for each i ∈ X:
F (~x) =∑R⊆X
f(R)∏i∈R
xi∏i 6∈R
(1− xi) . (2.1)
In the general setting, in which f is given by a value oracle, F cannot be computed in polynomial
time, but it can be efficiently estimated by random sampling.
The value F (~x) can be viewed as the expected value of a random subset of X in which each
element i appears with probability xi. We identify an integral vector ~z ∈ 0, 1X with the set
Z whose indicator vector is χZ = ~z. Then, for any vector ~z ∈ 0, 1X , we have F (~z) = f(~z)
and so F is indeed a relaxation of f . Furthermore, (as shown by Calinescu et al. [20]) the
Chapter 2. Preliminaries 19
monotonicity and submodularity of f imply, respectively, that
∂F
∂xi≥ 0 for all i ∈ X (2.2)
∂2F
∂xi∂xj≤ 0 for all i, j ∈ X with i 6= j . (2.3)
The continuous greedy algorithm considers the relaxed problem of maximizing F (~x) subject
to the constraint that ~x is in the polytope P (M) of M:
P (M) = ~x ∈ [0, 1]X : ~x(S) ≤ rank(S), ∀S ⊆ X
Here, ~x(S) =∑
i∈S xi. Alternatively (as shown in Edmonds [33]), the polytope P (M) is simply
the convex hull of the characteristic vectors of sets in I. The continuous greedy algorithm solves
the problem
maxF (~x) : ~x ∈ P (M) .
Due to the structure of F , this is a non-linear optimization problem. However, a (1 − 1/e)-
approximation is attained by a simple continuous algorithm. The algorithm follows a trajectory
~x(t) for t ∈ [0, 1], with ~x(0) = 0 initially. The trajectory satisfies the following differential
equationd~x(t)
dt= ~vmax(~x(t))
where ~vmax(~x(t)) = arg max~v∈P (~v · ∇F (~x(t))). Intuitively, the equation simply states that at
each instant, the trajectory ~x(t) moves in the direction of the vector that maximizes the increase
in F (given by the gradient vector ∇F (~x(t))) subject to the constraint that this vector is in the
polytope P . The algorithm returns the final value ~y = ~x(1).
In practice, the continuous algorithm is implemented on a suitably discretized time scale.
That is, we split [0, 1] into integral multiples of some ε, so that t takes ε−1 discrete values. At
each discrete time step, t the problem of finding the direction vmax with which to update ~x can
be accomplished by the following procedure. First, the value of ∇F is estimated via a sampling
procedure. For each i ∈ X, we set a weight w(i) equal to the partial derivative ∂F/∂xi given
by the ith coordinate of ∇F . Then, the ~vmax can be computed by finding a maximum weight
independent set in the matroid M with weights given by w. This is done via the standard
greedy algorithm. The algorithm updates ~x(t+ ε)← ~x+ ε · ~vmax.
Next, the final fractional solution ~y produced by the first phase is rounded to obtain an
integral solution to the original problem. This is accomplished by a pipage rounding procedure.
From (2.2) we have that F must be increasing in each coordinate. Hence, we can assume that
~y is in fact in the base polytope B(M) of the matroid
B(M) = x ∈ P (M) : x(X) = rank(X) ,
Chapter 2. Preliminaries 20
which has as extreme points vectors corresponding to the characteristic vectors of bases of M.
The pipage rounding procedure obtains an integral solution by repeatedly altering the co-
ordinates of ~y. Let ei denote the incidence vector χi (that is, ei is 1 in the coordinate
corresponding to i and 0 in all other coordinates). Each alteration is of the form
~y + t(ei − ej) , (2.4)
where t may be either positive or negative. Thus, each rounding step consists of increasing
one coordinate and decreasing another by the same amount, while leaving all other coordinates
unchanged.
The pipage rounding approach depends on two facts, upon which we shall now elaborate.
First, while the function:
F ~y~v (t) = ~y + t~v
is not necessarily convex in every direction ~v, it is convex in all directions ei−ej corresponding
to adding an element i and removing an element j. This follows directly from (2.3), which
itself follows from submodularity of f . Second, because M is a matroid, we can move from a
fractional solution ~y to a vertex of B(M) by moving only in these directions. Combined, these
facts imply that the pipage rounding procedure never decreases the value of F , as we shall see.
Unfortunately, this means that pipage rounding does not generalize to other independence sys-
tems, such as those given by intersection of multiple matroids, in which the rounding procedure
must consider directions along which f is not concave.
The rounding maintains the invariant that ~y ∈ B(M) and F (~y) non-decreasing. We call a
set A ⊆ X tight if its corresponding constraint in the matroid polytope P (M) is tight, so that
x(A) = rank(A). Since rank(A) is an integer, any tight set that contains some fractional variable
must contain at least 2 fractional variables. An iteration of the pipage rounding procedure
begins with any pair of fractional values yi, yj ∈ X, and then alters ~y by first increasing t from
0 in (2.4) until some set A+ becomes tight or either yi or yj becomes integral (we denote by t+
the value of t ≥ 0 at which this happens). Next, the procedure decreases t from 0 until some
set A− becomes tight or yi or yj becomes integral (we denote by t− the value of t ≤ 0 for which
this happens. Thus, the vector given by (2.4) is in P (M) for all t ∈ [t−, t+]. Furthermore, it
follows from (2.3) that the function:
F ~yij(t) = ~y + t(ei − ej)
is convex for any i 6= j. Thus, F ~yij attains its maximum value over t ∈ [t−, t+] at either t− or
t+. One of F ~yij(t+) and F ~yij(t
−) is greater than or equal to F (~y), and the rounding procedure’s
choice of t can be made to guarantee that F does not decrease.2
It remains to show how to choose i and j so this procedure will eventually terminate with an
2In [21] a randomized rounding procedure is used that only ensures that the expected value of F is non-decreasing. This procedure has the advantage of not needing to compute the value of F .
Chapter 2. Preliminaries 21
integral vector. As shown by Calinescu et al. [20, 21], this depends intimately on the fact that
M is a matroid. We omit the full proof here, but give some intuition in terms of the geometry
of B(M) and a sketch of the proof in terms of the structure of tight sets.
For our geometric intuition, we note that a strengthening of the exchange characterization
of Theorem 2.12 (see Brualdi [16], and also Schrijver [85, Theorem 39.12]) implies that the
adjacent vertices A,B of the base polytope B(M) have the property that both A + j − i and
B − j + i are bases for some i ∈ A \ B and j ∈ B \ i [85, Theorem 40.6]. This implies that
it is possible to move from an initial fractional solution ~y ∈ B(M) to any vertex of B(M) by
following only vectors of the form t(ei − ej). These vectors are precisely the ones considered
by the pipage rounding algorithm.
More formally, Theorem 2.16 states that the rank function rank ofM must be submodular.
It follows (see [21] for a proof) that if A and B are two tight sets then so are A∩B and A∪B.
The pipage rounding procedure begins by choosing any pair of fractional values yi, yj and a
tight set T containing i and j (initially T = X). Suppose that after one iteration neither yi
nor yj has become integral. Then, there is some new set A (equal to either A+ or A−) which is
tight with respect to the updated vector ~y + t(ei − ej) corresponding to the algorithm’s choice
of t. But, A and T are both tight and A 6= S, so the set T ∩ A is also tight. Furthermore,
T ∩A contains either i or j, and yi and yj are fractional. Thus, T ∩A must contain at least 2
fractional variables. The pipage rounding procedure updates T to be the set T ∩A and chooses
a pair i and j from T ∩A such that yi and yj are fractional, then continues. The new tight set
T ∩A is smaller in the next iteration and so eventually the algorithm must find a set containing
only i, j in which case it can make one of i,j integral. The procedure then repeats until all
variables are integral.
In practice, finding a tight set can be implemented as the submodular minimization problem:
arg minA∈A
(rank(A)− y(A)) (2.5)
where A = S ⊆ X : i ∈ S, j 6∈ S. In contrast to submodular maximization, submodular
minimization can be accomplished in polynomial time [84, 56, 96]. Because the submodular
minimization problem (2.5) is of a particular restricted sort, a faster algorithm by Cunningham
[31] can be applied.
In further work, Vondrak [95] considers the problem of maximizing a monotone submodular
function whose total curvature is at most c. He shows that the continuous greedy algorithm is
a (1 − ec)/c-approximation, and that it is impossible to attain a better approximation with a
polynomial number of queries in the value oracle model. Previously, Conforti and Cornuejols [29]
had shown that the standard greedy algorithm attains an approximation ratio of (1− ec)/c for
monotone submodular maximization in uniform matroids when the function has total curvature
at most c. In the case of an arbitrary matroid, however, they showed that the standard greedy
algorithm is only a 1/(1 + c)-approximation.
Chapter 2. Preliminaries 22
In a series of papers, Chekuri, Vondrak, and Zenklusen [25, 26, 27] develop a new rounding
technique called swap rounding, which can be used in place of pipage rounding in the continuous
greedy algorithm. By using the multilinear relaxation F together with their new rounding
procedure they derive improved approximations for a variety of problems involving matroid
constraints together with various other constraints. They also consider the case of matroid
intersection and give an improved approximation algorithm for budgeted linear variants of the
problem and some restricted types of monotone submodular functions.
The multilinear relaxation F has been used in a variety of other settings as well, including
non-monotone [69] and multi-budgeted maximization [47]. Vondrak [94] uses F to formulate the
“symmetry gap” of a problem, which he uses to unify several negative results in the value oracle
model. Oveis-Gharan and Vondrak [44] use the relaxation F to design an algorithm based on
simulated annealing. They show that the resulting algorithm gives improved approximations
for non-monotone submodular maximization both in the unconstrained and matroid settings.
Finally, Feldman, Naor, and Schwartz give an adapatation of the continuous greedy algorithm
that gives further improved approximations for non-monotone submodular maximization [36].
In follow up work [37], they unify this and a variety of other applications of the general contin-
uous greedy approach.
2.6 Partial Enumeration
In this section, we give a meta-algorithm that can be used to obtain a small improvement in
the approximation performance of any algorithm A for maximizing a submodular function in
an independence system. We use a partial enumeration technique similar to that used by Sahni
[82] for the 0/1 knapsack problem. This approach is further described by Khuller et al. [65]
and Sviridenko [89] in the context of budgeted maximum coverage and Calinescu et al. [20] for
the general problem of submodular maximization subject to a matroid constraint. Effectively,
we “guess” a single element of the optimal solution, and then run A on a modified instance in
which all solutions contain this element. We then iterate over all possible guesses.
The technique is based on the notion of contraction, which we now define. Our definition
is inspired by the standard notion of contraction in matroids [85, Section 39.3]. Note that here
we consider contraction in an arbitrary independence system (not necessarily a matroid).
Definition 2.17 (Contraction). Let (X, I) be an independence system, and x ∈ X. We define
the set system (X − x, Ix) by Ix = A ⊆ X : A ∪ x ∈ I (i.e. a set A is in Ix if and only
A ∪ x is independent in the original set system).
In order to use the partial enumeration technique for a class of independence systems, we
require that the class be closed under contraction. First, we show that the contraction of an
independence system is an independence system.
Theorem 2.18. Suppose (X, I) is an independence system and x ∈ X. Then, (X − x, Ix) is
also an independence system.
Chapter 2. Preliminaries 23
Proof. Recall that for any set system (X, I), we assume that x ∈ I for all x ∈ X. Thus,
∅ ∈ Ix for all x ∈ X. Furthermore, for any x ∈ X, if A ∈ Ix, and B ⊆ A ⊆ X − x then
B + x ⊆ A+ x ∈ I. Because I is downward-closed, B + x ∈ I and so B ∈ Ix.
We now show that the subclass of matroids is also closed under contraction.
Theorem 2.19. Suppose that (X, I) is a matroid and x ∈ X. Then (X − x, Ix) is a matroid.
Proof. Theorem 2.18 shows that (X−x, Ix) must be an independence system. Thus, it suffices
to show that (X−x, Ix) satisfies the condition of Definition 2.10. Let A,B ∈ Ix with |A| > |B|.Then, A + x and B + x must be in I. We have |A + x| = |A| + 1 and |B + x| = |B| + 1, so
|A+x| > |B+x|. Because (X, I) is a matroid, there must be some element y ∈ (A+x)\(B+x)
such that (B + x) + y ∈ I. But, this means that we must have B + y ∈ Ix where y ∈ B \ Aand so (X − x, Ix is a matroid.
We also need a similar notion of contraction for functions.
Definition 2.20 (Function Contraction). Let f be a submodular function on ground set X
and let x ∈ X. We define (with slight abuse of notation) the contracted function fx on X − xas
fx(A) = f(A+ x)− f(x) .
The next theorem shows that the class of monotone submodular functions is closed under
contraction.
Theorem 2.21. If f is (monotone) submodular, then so is fx.
Proof. Suppose f is submodular and let A,B ⊆ X − x. Then,
fx(A) + fx(B) = f(A+ x) + f(B + x)− 2f(x)
≥ f((A+ x) ∪ (B + x)) + f((A+ x) ∩ (B + x))− 2f(x)
= f((A ∪B) + x) + f((A ∩B) + x)− 2f(x)
= fx(A ∪B) + fx(A ∩B).
Suppose that f is monotone and let A ⊆ B ⊆ X − x. Then, since A+ x ⊆ B + x we have
fx(A) = f(A+ x)− f(x) ≤ f(B + x)− f(x) = fx(B).
Now, we are ready to formulate this section’s main algorithmic result. Suppose that A is an
algorithm for monotone submodular maximization in some class of independence systems that
is closed under contraction. We consider the meta-algorithm PartEnum(A), shown in Algorithm
4. When applied to an instance (X, I) the meta-algorithm simply runs A on the contracted
instance (X − x, Ix), fx for each x ∈ X, and returns the best resulting solution.
Chapter 2. Preliminaries 24
Algorithm 4: PartEnum(A)Input: independence system (X, I)
submodular function fforeach x ∈ X do
Let Sx be the result of running A on (X − x, Ix), fx;
Let y = arg maxx∈X f(Sx ∪ x);return Sy ∪ y;
Theorem 2.22. Suppose that Algorithm A is a θ-approximation for submodular maximization
over some class of independence systems closed under contraction. Then PartEnum(A) is a
( 1n+ n−1
n θ)-approximation for the same problem, where n is the maximum size of an independent
set in the given set system.
Proof. Consider an instance (X, I), f . Let O be an optimal solution for this instance, and
y = arg maxx∈O f(x). Submodularity implies that
f(O) ≤∑x∈O
f(x) ≤ |O|f(y) ≤ nf(y) .
Furthermore, since A is a θ-approximation algorithm,
fy(Sy) ≥ θfy(O \ y) = f(O) .
Let S be the solution produced by Algorithm 4 on the instance (X, I), f . Then,
f(S) = f(y) + fy(Sy)
≥ f(y) + θfy(O − y)
= f(y) + θ(f(O)− f(y))
= (1− θ)f(y) + θf(O)
≥(
1− θn
+ θ
)f(O) =
(1
n+n− 1
nθ
)f(O) .
Chapter 3
Maximum Coverage
In this section, we derive a deterministic, combinatorial algorithm for maximizing a special
type of monotone submodular function subject to a single matroid constraint. We consider the
case of arbitrary monotone submodular functions in the next chapter. As this special case is
significantly simpler, we present it first in the hope that it will provide some intuition for the
general submodular case. Also, our algorithm in this case is faster and deterministic.
3.1 The Problem
First, let us define the special class of monotone submodular functions that we consider.
Definition 3.1 (Coverage Function). Suppose we are given:
• a universe U of elements
• a weight function w : U → R≥0 assigning each element of U a weight
• a collection1 F = F1, . . . , Fn ⊆ 2U of subsets of U
Then, the coverage function f : 2[n] → R≥0 associated with U , w, and F is defined by
f(S) = w
(⋃i∈S
Fi
).
Where, as usual, we denote by w(A) the total weight∑
x∈Aw(x). Then, f(S) is the total
weight of elements covered by sets Fi, where i ∈ S. We write f = (U,w,F) to state that f is
the coverage function specified by U , w, and F .
1We allow for the case that some set may appear multiple times in F . Such sets are obviously redundant inthe unconstrained setting, but not when we consider maximum coverage subject to a matroid constraint, as weshall describe shortly. Nonetheless, all of the negative results that we state hold even in the case that F containsonly disjoint sets.
25
Chapter 3. Maximum Coverage 26
We consider the problem of matroid maximum coverage, in which we are given a coverage
function f of the form in Definition 3.1, and a matroid M = ([n], I) over [n]. We seek an
independent collection of indices S ∈ I such that f(S) is maximized (i.e. so that the total
weight of elements of U covered by the sets in Fi : i ∈ S corresponding to S is maximized).
We shall use the following terms to measure the complexity of our algorithm:
n = |F| u = maxA∈F|A| r = rank(M) .
That is, n is the number of sets defining f , and also the size of the ground set of M, u is the
maximum size of a set in F (note that u ≤ |U |), and r is the rank of M (note that r ≤ n).
Maximum coverage was first defined by Hochbaum and Pathria [54] in 1998. Maximum
coverage over a partition matroid was considered by Chekuri and Kumar [24] under the name
maximum coverage with group budget constraints. Algorithms for maximum coverage with
group budget constraints with the optimal approximation ratio 1 − 1/e were presented by
Srinivasan [88] and by Ageev and Sviridenko [1]. Both algorithms first formulate the problem
as a linear programming relaxation, and then round the solution. Other generalizations of
maximum coverage appear in the literature. In budgeted maximum coverage, each element is
provided with a cost, and the sets chosen by the algorithm are restricted by the total cost of
the elements they cover. Khuller, Moss and Naor [65] show that a greedy approach yields an
optimal (1−1/e)-approximation algorithm. Cohen and Katzir [28] generalize this even further,
obtaining the generalized maximum coverage problem. In this problem, elements have different
costs and profits depending on which set they are covered with and the sets themselves also
have costs. The goal is to maximize the total profit subject to a single budget constraint. Cohen
and Katzir provide an optimal (1− 1/e− ε)-approximation generalized maximum coverage via
a greedy-like algorithm. As noted in Chapter 2, the (1− 1/e) inapproximability result of Feige
[35] applies even in the special case of maximum coverage subject to a cardinality constraint.
Our algorithmic starting point is the oblivious k-local search algorithm, k-LocalCoverage,
shown in Algorithm 5. The algorithm starts from a base S obtained by the standard greedy
algorithm and repeatedly attempts to improve the total weight of all elements covered by S by
adding up to k sets to S and removing up to k sets from S, maintaining the matroid constraint.
We call a pair (A,R) a k-exchange for S if A ⊆ [n] \ S and R ⊆ S with |A| ≤ k and |R| ≤ k.
When no single k-exchange improves the weight of all elements covered by S, the algorithm
returns S.
We begin by showing that the locality ratio of k-LocalCoverage is no better than the ap-
proximation ratio attained by the greedy algorithm (as r →∞).
Theorem 3.2. For all r > k and ε > 0, there exists an instance I(r, k, ε) of matroid maximum
coverage for which the locality ratio of k-LocalCoverage is at most (1 + ε) (r−1)2r−k−1 . Furthermore,
the greedy solution produced by SubmodularGreedy is a local optimum attaining this bound.
Proof. Let the coverage function f = (U,w,F) for instance I(r, k, ε) be as follows. The universe
Chapter 3. Maximum Coverage 27
Algorithm 5: k-LocalCoverage
Input: Coverage function f = (U,w,F), where |F| = nMatroid M = ([n], I) as an independence oracle for I
Let Sinit be the result of running SubmodularGreedy on f , M;S ← Sinit;repeat
foreach A ⊆ [n] \ S, with |A| ≤ k doforeach R ⊆ S, with |R| ≤ k do
if (S \R) ∪A ∈ I and f((S \R) ∪A) > f(S) thenS ← (S \R) ∪A;
until no exchange is applied to S;return S;
U consists of r − 1 elements x1, . . . , xr−1 and r − k elements y1, . . . , yr−k, all of weight 1,
together with r−1 elements ε1, . . . , εr−1 of arbitrarily small weight ε > 0. For each 1 ≤ i ≤ r,we suppose that the ground set [2r] contains 2 distinct elements that we label ai and bi. Then,
F is made up of the sets Ai = Fai and Bi = Fbi for each 1 ≤ i ≤ r, defined as follows:
Ai = εi for 1 ≤ i ≤ r − 1, Ar = x1, . . . , xr−1,
Bi = xi for 1 ≤ i ≤ r − 1, Br = y1, . . . , yr−k.
Then, |F| = 2r. The matroid for I(r, k, ε) is a partition matroid of rank r, whose independent
sets contain at most one of ai, bi for each i ∈ [r]. That is, we can choose at most one of
the sets Ai, Bi from F for each i ∈ [r]. The globally optimal solution is B = bi : i ∈ [r],corresponding to choosing all of the sets Bi for i ∈ [r]. These sets cover elements of total weight
r − 1 + r − k = 2r − k − 1. On the other hand, we consider the solution A = ai : i ∈ [r],corresponding to choosing all of the sets Ai for i ∈ [r]. These sets cover elements of total weight
only (r − 1)ε+ (r − 1) = (r − 1)(1 + ε).
Consider an arbitrary k-exchange operation applied to A. This operation exchanges ai for
bi for at most k values i ∈ [r]. If we exchange ar for br, then we reduce the total weight of
elements covered by (r − 1)− (r − k) = k − 1. There are k − 1 remaining exchanges, and each
increases the total weight of the covered elements by only 1 − ε, so the total weight covered
must decrease by (k − 1)ε. However, if we do not replace ar with br, each other replacement
can only decrease the value of the solution by ε, so the total weight covered must decrease by
kε. Thus, A is a local optimum for Algorithm k-LocalCoverage, and so the locality ratio for
k-LocalCoverage is at mostf(A)
f(B)=
(r − 1)(1 + ε)
2r − k − 1
Additionally, we consider the standard greedy algorithm SubmodularGreedy applied to this
instance. It begins by choosing the set Ar, since this set covers elements of maximum total
Chapter 3. Maximum Coverage 28
weight. Then for each 1 ≤ i ≤ n− 1, it must choose the set Ai which covers additional weight
ε, instead of the set Bi, which covers no additional weight. Thus, the greedy algorithm must
terminate with the solution A.
The second part of Theorem 3.2 shows that initializing the local search procedure with
the greedy algorithm does not give any increase in performance beyond the locality ratio of
k-LocalCoverage. That is, for any constant k, the approximation ratio of k-LocalCoverage is
onlyr − 1
2r − k − 1=
1
2+O(r−1) .
3.2 A Non-Oblivious Local Search Algorithm
Let us examine in more detail the instance I(2, 1, ε) from the perspective of the algorithm
1-LocalCoverage. The sets F for this instance are given by:
Fa1 = A1 = ε1 Fa2 = A2 = x1
Fb1 = B1 = x1 Fb2 = B2 = y1
In particular, let us consider the bases corresponding to B1, A2 and A1, A2. The set
B1, A2 covers slightly fewer elements than A1, A2, but it is intuitively “better” than
A1, A2 since it is of almost equal value to A1, A2 but additionally covers element x twice.
From our perspective, this is advantageous, since it ensures that x will stay covered regardless
of how the next exchange is chosen.
Following this intuition, we develop a non-oblivious local search algorithm guided by a
potential function that gives preference to “robust” solutions, which cover elements multiple
times, even if these solutions have slightly lower objective value. The same approach appears
in Alimonti [2] and Khanna et al. [64], in the context of the maximum satisfiability problem
and its variants. There, the idea is to give extra weight to clauses which are satisfied by more
than one variable, because these clauses will remain satisfied after flipping the next variable in
the search procedure.
We modify Algorithm 5 by replacing the oblivious objective function f with a potential
function g of the general form:
g(S) =∑u∈U
α#(u,S)w(u) , (3.1)
where
#(x, S) = |Ai : i ∈ S and x ∈ A|
is the number of sets specified by S that contain the element x and we define α0 = 0, so that
Chapter 3. Maximum Coverage 29
elements not included in any set chosen by S are not counted in the sum. Then, by setting,
for example αi > α1 for all i > 1, we can give extra value to solutions that cover some element
more than once. Additionally, note that if we set αi = 1 for all i > 0, we recover the problem’s
given objective function f .
Now we are faced with the problem of how to set the αi. In order gain some intuition, we
return to the instance I(2, 1, ε). We can write the values of f and g on all bases of this instance
in terms of the parameters ε, α1, and α2.
f(a1, a2) = (1 + ε) g(a1, a2) = (1 + ε)α1
f(a1, b2) = (1 + ε) g(a1, b2) = (1 + ε)α1
f(b1, a2) = 1 g(b1, a2) = α2
f(b1, b2) = 2 g(b1, b2) = 2α1 .
Now, we turn to the problem of setting α1 and α2. We shall see that we must balance two
concerns. We want set α2 to be large enough that the algorithm prefers robust solutions in
the short term, allowing it to escape from bad local optima. However, we want to set α2 to be
small enough that the algorithm still makes progress with respect to the objective function f
in the long term.
Suppose that we set α2 to be slightly larger than α1. Then, the solution a1, a2 will remain
a 1-local optimum of g if
ε =α2
α1− 1 .
In this case, the locality ratio will be at most
f(a1, a2)f(b1, b2)
=1 + ε
2=
α2
2α1.
This confirms our intuition that setting α2 to be larger than α1 will improve the locality ratio.
Now suppose that, guided by the observation that larger values of α2 improve the locality
ratio, we set α2 > 2α1. Then, the solution b1, a2 will become a local optimum of g. However,
in this case the locality ratio will be at most
f(b1, a2)f(b1, b2)
=1
2.
This confirms our intuition that it is bad to give too much extra weight to elements covered
multiple times.
This analysis has been tied closely to the specifics of the instance I(1, 2, ε). In Section 3.4 we
give a general procedure that can be used to infer the optimum value of the α coefficients for all
problems of rank r. The procedure generates a family of linear programs (one for each value of
r ≥ 1), that encode the problem of choosing the optimal values of the α sequence on instances of
rank r. Now, however, we simply state the resulting general formula for the α values obtained in
Chapter 3. Maximum Coverage 30
this fashion and show that these values result in an improved (1−1/e)-approximation algorithm.
We define the values
α0 = 0 , α1 = 1− 1
e, αi+1 = (i+ 1)αi − iαi−1 −
1
e. (3.2)
We now derive various properties of the α sequence, which we shall use in our analysis. In our
proofs it will be useful to consider the sequence δ of differences between consecutive members
of the α sequence.
Lemma 3.3. Let δi = αi+1 − αi. Then, the δi satisfy the recurrence relation
δ0 = 1− 1
e, δi = iδi−1 −
1
e. (3.3)
Furthermore, they are in fact equal to
δi = i!
(1− 1
e
i∑k=0
1
k!
). (3.4)
Proof. Clearly δ0 = α1 − α0 = 1− 1e . In general, we have
δi = αi+1 − αi = (i+ 1)αi − iαi−1 −1
e− αi = i(αi − αi−1)− 1
e= iδi−1 −
1
e.
Using this recurrence, we can prove formula (3.4) by induction. It is immediate in the case
i = 0. For i > 0, we have
δi = iδi−1 −1
e= i(i− 1)!
(1− 1
e
i−1∑k=0
1
k!
)− i! 1
i!e
= i!
(1− 1
e
i∑k=0
1
k!
).
By examining the sequence δ, we now show that the α sequence is increasing and concave.
Lemma 3.4. For all i < n, δi > 0 and δi > δi+1.
Proof. The inequality δi > 0 follows directly from (3.4) and the fact that∑i
k=01k! < e for all
i <∞. For the second claim, we first define
e[i] =1
i!i+
i∑`=0
1
`!
Chapter 3. Maximum Coverage 31
and note that
e[i] − e =1
i!i+
i∑`=0
1
`!−∞∑`=0
1
`!=
1
i!i−
∞∑`=i+1
1
`!
=1
i!i− 1
(i+ 1)!
∞∑`=i+1
(i+ 1)!
`!=
1
i!i− 1
(i+ 1)!
∞∑`=0
(i+ 1)!
(`+ i+ 1)!
>1
i!i− 1
(i+ 1)!
∞∑`=0
1
(i+ 1)`=
1
i!i− 1
(i+ 1)!
i+ 1
i= 0 .
Then, for all i ≥ 0, (3.3) and (3.4) give
δi+1 − δi = iδi −1
e= i!i
(1− 1
e
i∑k=0
1
k!
)− 1
e< i!i
(1− 1
e[i]
i∑k=0
1
k!
)− 1
e[i],
where the last inequality follows from e < e[i]. Now, set a = 1i·i! and b =
∑ik=0
1k! . We have
e[i] = a+ b, and the final term of the inequality above is equal to:
1
a− 1
a· b
a+ b− 1
a+ b=a+ b− b− aa(a+ b)
= 0 .
Finally, we show that the α sequence is bounded.
Lemma 3.5. For all i, αi ≤ Hi.
Proof. The recurrence (3.3) and Lemma 3.4 imply that for any k we have:
δk−1 =1
k
(δk +
1
e
)<
1
k
(δ0 +
1
e
)=
1
k.
We obtain the bound on αi by summing this inequality:
αi =i∑
k=1
δk−1 <i∑
k=1
1
k= Hi .
The fact that the α sequence is increasing and concave implies that the non-oblivious po-
tential function g is also monotone submodular.
Theorem 3.6. Let f be a monotone submodular function on F , and let g be defined by (3.1)
and (3.2). Then g is monotone submodular.
Proof. Suppose that T ⊆ S ⊆ [n] and i ∈ [n] \ S. Then, note that for all a ∈ Fi we have
Chapter 3. Maximum Coverage 32
#(a, T ) ≤ #(a, S) since T ⊆ S. Then,
gS(i) = g(S + i)− g(S) =∑a∈Fi
(α#(a,S)+1 − α#(a,S))w(a) =∑a∈Fi
δ#(a,S)w(a)
<∑a∈Fi
δ#(a,T )w(a) =∑a∈Fi
(α#(a,T )+1 − α#(a,T ))w(a) = g(T + i)− g(T ) = gT (i) ,
where the inequality follows from #(a, T ) ≤ #(a, S) and the fact that the δ sequence is de-
creasing (Lemma 3.4).
For monotonicity, we note that
gS(i) =∑a∈Fi
δ#(a,S)w(a) > 0 ,
where the last inequality follows from the fact that the δ sequence is positive (Lemma 3.4).
Using g, we formulate a local search algorithm MatroidCoverage for matroid maximum cov-
erage, shown in Algorithm 6. The algorithm is parameterized by an approximation parameter
ε > 0, which governs how much of an improvement is required to improve the current solution
to be accepted. We describe this aspect of the algorithm in more detail in the next section.
MatroidCoverage first computes a greedy solution to the non-oblivious potential function g. It
then uses a simple approximate local search procedure guided by g that exchanges 1 element
in the current solution for one element not in the current solution.
Algorithm 6: MatroidCoverage
Input: Approximation parameter ε > 0. Coverage function f =(U,w,F), where |F| = nMatroid M = ([n], I), as an independence oracle for I
Let g be the potential function given by (3.1) and (3.2);S ← the result of running SubmodularGreedy on g, M;
Set r = |S| and ε0 = εrHr
(1− 1
e − ε)−1
;
repeatforeach a ∈ [n] \ S do
foreach b ∈ S doif S − b+ a ∈ I and g(S − b+ a) > (1 + ε0)g(S) then
S ← S − b+ a;break;
until no exchange is applied to S;return S;
Chapter 3. Maximum Coverage 33
3.3 Analysis of the Algorithm
We now analyze the approximation performance and runtime of MatroidCoverage. We examine
an arbitrary instance f = (U,F , w), M = ([n], I) of matroid maximum coverage. The solution
S returned by MatroidCoverage for this instance is an ε0-approximate local optimum of g. We
consider some global optimum O of f . Because f is monotone, we can assume without loss of
generality that O is a base of M. As shown in Theorem 3.6, our potential function g is also
monotone. Thus, we can also assume that the initial greedy solution computed by Matroid-
Coverage is a base of M. Furthermore, each improvement step in MatroidCoverage maintains
the cardinality of the solution. Thus, the final solution S produced by the MatroidCoverage
must also be a base of M.
From Theorem 2.13, there must exist a bijection π : O → S such that S − π(i) + i is a base
of M for all i ∈ O. Let O = o1, . . . , or and then number the elements of S as s1, . . . , srso that si = π(oi) for all i ∈ [r]. Then, we consider only the exchange operations that remove
π(oi) = si from S and add oi to the result. Each such operation is considered by the algorithm
at each iteration, and each one of them results in an independent set in M. Thus, ε0-local
optimality implies that we have
(1 + ε0)g(S) ≥ g(S − si + oi)
for each i ∈ [r]. Summing over all r such inequalities we obtain the inequality
r(1 + ε0)g(S) ≥r∑i=1
g(S − si + oi). (3.5)
The main difficulty of the analysis lies in the fact that inequality (3.5) is given in terms of
the non-oblivious potential function g, but we wish to derive an approximation guarantee for
the original objective function f . In order to put g and f on common ground, we introduce the
following symmetric notation.
For any two subsets A,B ⊂ [r], we define EA,B to be the set of elements that belong to all
of the sets Fsi for i ∈ A and Foj for j ∈ B and no other sets in F . The sets EA,B thus form a
partition of U . Then, for all integers a, x, b ≥ 0 such that 1 ≤ a+ x+ b ≤ r, we define
wa,x,b =∑
A,B⊆[r]|A\B|=a,|B\A|=b,|A∩B|=x
w(EA,B).
In words, wa,x,b is the total weight of elements that belong to exactly a + x of the sets Fsispecified by S and exactly b+ x of the sets Foj specified by O, where exactly x of the sets have
the same index in both S and O (that is, x of the sets have i = j). We call the variables wa,x,b
symmetric variables.
Chapter 3. Maximum Coverage 34
We can express all the quantities we are interested in using the symmetric variables wa,x,b:
f (S) =∑
a,x,b≥0a+x≥1
wa,x,b (3.6)
f (O) =∑
a,x,b≥0b+x≥1
wa,x,b (3.7)
g(S) =∑
a,x,b≥0a+x≥1
αa+xwa,x,b =∑
a,x,b≥0
αa+xwa,x,b (since α0 = 0) (3.8)
r∑i=1
g(S − si + oi) =∑
a,x,b≥0
(aαa+x−1 + bαa+x+1 + (r − a− b)αa+x)wa,x,b . (3.9)
The only expression which is not immediate is the one for∑r
i=1 g(S − si + oi). We derive it as
follows: consider an arbitrary element u and let A,B ⊆ [r] be the unique sets of indices such
that u ∈ EA,B. Then, u’s weight is counted in (only) the quantity wa,x,b where x = |A ∩ B|,a = |A \ B|, and b = |B \ A|. In S, the element u appears in |A| = a+ x sets. If i ∈ A \ B, it
appears in |A|− 1 = a+x− 1 sets in (S− si + oi). If i ∈ B \A, it appears in |A|+ 1 = a+x+ 1
sets in (S − si + oi). Finally, if i ∈ A ∩ B or i /∈ A ∪ B, it also appears in |A| = a + x sets in
(S − si + oi). For each such element, there are precisely a values leading to exchanges of the
first type, b values leading to exchanges of the second type, and r− a− b values of i leading to
exchanges of the third type.
We are now ready to state the main technical result of this section, which bounds the locality
ratio of MatroidCoverage.
Lemma 3.7. If S is an ε0-approximate local optimum for function g, then:
(1 + ε0rHr)f(S) ≥ (1− 1/e)f(O) .
Proof. Because S is an ε0-approximate local optimum, (3.5) holds. We use (3.8) and (3.9) to
reformulate inequality (3.5) in terms of our symmetric notation, obtaining
0 ≤∑
a,x,b≥0
((a+ b+ ε0r)αa+x − aαa+x−1 − bαa+b+1)wa,x,b. (3.10)
Similarly, we use (3.6) and (3.7) to reformulate our goal in terms of our symmetric notation,
obtaining
0 ≤ (1 + ε0rHr)∑
a,x,b≥0a+x≥1
wa,x,b −(
1− 1
e
) ∑a,x,b≥0a+b≥1
wa,x,b. (3.11)
To prove the lemma, we show that (3.10) implies (3.11). Since we have wa,x,b ≥ 0 for all a, x, b,
it suffices to show that the coefficient of any term wa,x,b in (3.10) is at most its coefficient
in (3.11). We consider three cases, simplifying expressions throughout by using the fact that
Chapter 3. Maximum Coverage 35
α0 = 0.
In the first case, suppose that b = x = 0. We must show that for all 1 ≤ a ≤ r,
(a+ ε0r)αa − aαa−1 ≤ 1 + ε0rHr.
We have
(a+ ε0r)αa − aαa−1 = aδa−1 + ε0rαa by definition of δ
= δa +1
e+ ε0rαa by recurrence (3.3)
≤ δ0 +1
e+ ε0rαa by Lemma 3.4
= 1 + ε0rαa by definition of δ
≤ 1 + ε0rHr by Lemma 3.5 .
In the next case, suppose that a = x = 0. We must show that for all 1 ≤ b ≤ r,
−bα1 ≤ −(
1− 1
e
).
This follows directly from the fact that b ≥ 1 and α1 = 1− 1e .
Finally, we must show for a, x, b such that a+ x 6= 0, b+ x 6= 0, and 1 ≤ a+ x+ b ≤ r,
(a+ b+ ε0r)αa+x − aαa+x−1 − bαa+x+1 ≤1
e+ ε0rHr . (3.12)
For this inequality, we consider two subcases. In the first, suppose b = 0 and so x ≥ 1. Then,
we have
(a+ ε0r)αa+x − aαa+x−1 = aδa + ε0rαa+x by definition of δ
= δa+1 +1
e− δa + ε0rαa+x by recurrence (3.3)
≤ 1
e+ ε0rαa+x by Lemma 3.4
≤ 1
e+ ε0rHr by Lemma 3.5 .
Chapter 3. Maximum Coverage 36
In the second subcase, we have b ≥ 1. Then
(a+ b+ ε0r)αa+x − aαa+x−1 − bαa+x+1
= aδa+x−1 − bδa+x + ε0rαa+x by definition of δ
≤ aδa+x−1 − δa+x + ε0rαa+x because b ≥ 1
=1
e− xδa+x + ε0rαa+x by recurrence (3.3)
≤ 1
e+ ε0rHr by Lemma 3.5 and δa+x ≥ 0 .
This completes the proof of the Lemma.
Now, we consider the run-time of MatroidCoverage. By keeping track of how many times
each element of U is covered by the current solution, we can compute the change in g due to
adding or removing a set from the solution using only time O(u); we simply increment the count
of each element in the added set and decrement the count of each element in the removed set,
and also compute the change in g due to each such increment or decrement. The initial greedy
phase takes time O(rnu). Each iteration of the local search phase examines O(rn) potential
exchanges, each of which requires a single call to the independence oracle for M and a single
evaluation of the change in g. Thus, each iteration can be completed in time O(rnu). We now
bound the total number of iterations that the algorithm can make.
Lemma 3.8. Algorithm MatroidCoverage makes at most O(ε−10 ) improvements.
Proof. Consider the solution Sinit produced by the initial greedy phase of the algorithm Matroid-
Coverage, and let Og = arg maxS∈I g(S). Then, MatroidCoverage makes at most log1+ε0g(Og)g(Sinit)
improvements. As shown in Theorem 3.6, g is monotone submodular. The classical result of Fis-
cher, Nemhauser, and Wolsey [43] implies that the greedy algorithm is then a 1/2-approximation
for maximizing g, and so g(Sinit) ≥ 12g(Og). Algorithm MatroidCoverage can therefore make at
most
log1+ε0
g(Og)
g(Sinit)≤ log1+ε0 2 =
log 2
log(1 + ε0)= O(ε−1
0 )
improvements.
We are now ready to state our main result.
Theorem 3.9. For all ε > 0, MatroidCoverage is a (1 − 1e − ε)-approximation algorithm for
matroid maximum coverage, running in time O(ε−1r2nu log r)
Proof. The theorem follows directly from Lemmas 3.7 and 3.8 after expanding the definition
ε0 =ε
rHr
(1− 1
e − ε) = O
(ε
r log r
).
Chapter 3. Maximum Coverage 37
Finally, by using the partial enumeration technique described in Section 2.6 we can remove
the small loss of ε from the approximation ratio of the MatroidCoverage, obtaining a clean
(1− 1e )-approximation.
Theorem 3.10. Suppose that we set ε = 13r in MatroidCoverage. Then, PartEnum(Matroid-
Coverage) is a (1− 1e )-approximation algorithm running in time O(r3n2u log r)
Proof. Theorem 3.9 shows that has an approximation ratio of 1 − 1e −
13r when ε is set to 1
3r .
Theorems 2.22 and 2.19 imply that the approximation ratio of PartEnum(MatroidCoverage) is
hence
1
r+
(1− 1
r
)(1− 1
e− 1
3r
)=
1
r+ 1− 1
e− 1
3r− 1
r+
1
er+
1
3r2
= 1− 1
e+
1
er− 1
3r+
1
3r2
> 1− 1
e,
where in the last line we have used the fact that e ≤ 3. The algorithm PartEnum(Matroid-
Coverage) makes n calls to MatroidCoverage. From Theorem 3.9, each of these takes time
O(r3nu log n) so the total runtime is O(r3n2u log r).
3.4 Obtaining the α Sequence
In Section 3.2 we gave an equation for the sequence α = α0, α1, . . . of coefficients for defining a
non-oblivious potential function and showed that the resulting function g yields an algorithm
with an improved approximation ratio. Here we describe the method by which the α sequence
was obtained. Our approach uses an idea similar to the factor revealing linear program applied
by Jain et al. [57] for the analysis of greedy algorithms for a facility location problem. Specifi-
cally, we formulate a linear program that determines, for an instance of rank r and a function of
the general form (3.1), the best attainable locality ratio, the value of the coefficients α1, . . . , αr
attaining this ratio (as usual, we shall assume α0 = 0), and a tight instance for the resulting
algorithm. We now show how to derive the program for some fixed value of r.
For any sequence α = α1, . . . , αr we denote by gα,f the function g obtained from f and α
by using (3.1). We consider the problem of choosing values for α1, . . . , αr, that maximize the
locality ratio over all instances of rank r. For now, we restrict ourselves to the following simpler
problem. Suppose that we are given a sequence α = α1, . . . , αr and want to find the worst-case
locality ratio2 of the algorithm MatroidCoverage over all instancesM, f with rank r, when gα,f
is used as the non-oblivious potential function in the algorithm.
Let S = s1, . . . , sr and O = o1, . . . , or be two sets of r indices. We want to find a
coverage function f on the ground set S∪O that minimizes f(S)/f(O), subject to the constraint
2Here, for ease of presentation, we consider exact local optima in our program, setting ε to 0 in Algorithm 6.Our approach generalizes easily to the case in which ε > 0.
Chapter 3. Maximum Coverage 38
that S is a local optimum for the function gα,f . The resulting problem can be formulated as
minF=Fii∈O∪Sf=(U,w,F)
f(S)
f(O)
s.t. gα,f (S)− gα,f (S − si + oi) ≥ 0 1 ≤ i ≤ r
(3.13)
The r constraints in (3.13) enforce the requirement that S is a local optimum with respect to
gα,f . Note that the constraints only consider exchanges of the form S− si + oi. As we noted in
Section 3.3, Theorem 2.13 implies that we can always index the elements of S and O so that
S − si + oi is independent for all 1 ≤ i ≤ r. Thus, there is always some such set of exchanges.
Moreover, there are instances (e.g. those instances considered in 3.2) in which the r exchanges
of the form guaranteed by Theorem 2.13 are the only ones that lead to a valid solution, and we
are interested in the worst-case locality ratio over all possible instances. Thus, it is sufficient
to consider only exchanges of this form.
We need not worry about size of S ∩ O in our program. That is, in formulating (3.13)
and our following programs, we can assume that S and O are disjoint, but in our analysis and
related discussion, we allow for S∩O 6= ∅. This is because considering the general case in which
there is some element i ∈ S ∩ O, is equivalent to considering a coverage function containing 2
copies of Fi: one with an index in si ∈ S and one with an index oi ∈ O. Hence it is sufficient
to minimize over all coverage functions in our programs, as this will implicitly minimize over
all configurations of S and O, as well.
Program (3.13) has a non-linear objective. However, we note that the ratio f(S)/f(O) and
local optimality with respect to gα,f (regardless of the specific sequence α) are both unchanged
if we scale all of the weights w(u) by a constant. Thus, we may assume without loss of generality
that f(O) = 1, obtaining the following program
minF=Fii∈O∪Sf=(U,w,F)
f(S)
s.t. gα,f (S)− gα,f (S − si + oi) ≥ 0 1 ≤ i ≤ r
f(O) = 1
(3.14)
We are now faced with question of how to implement the minimization over an arbitrary
coverage function f = (U,w, S ∪O) in (3.14). More concretely, we seek a succinct parameteri-
zation of that yields all possible such coverage functions. Here, we make use of the symmetric
variables wa,x,b introduced in Section 3.3. Equations (3.6) and (3.7) show that for any coverage
function f , we can express f(S) and f(O) in terms of wa,x,b. However, it remains unclear how to
represent the first set of constraints in (3.14). Equation (3.10) shows how to use the symmetric
Chapter 3. Maximum Coverage 39
variables to represent the weaker statement given in (3.5), here equivalent to∑1≤i≤r
[gα,f (S)− gα,f (S − si + oi)] ≥ 0 . (3.15)
If we replace the r constraints in (3.14) by the single constraint (3.15), and use (3.6) and (3.7)
to represent f(S) and f(O) we obtain the concrete linear program:
minwa,x,b
∑a+x≥1
wa,x,b
s.t.∑a,x,b
((a+ b)αa+x − aαa+x−1 − bαa+x+1)wa,x,b ≥ 0
∑b+x≥1
wa,x,b = 1
wa,x,b ≥ 0
(3.16)
The program (3.16) has a variable corresponding to wa,x,b for each triple of non-negative values
a, x, b satisfying a + x + b ≤ r. If the variables wa,x,b are the symmetric variables for some
underlying coverage function f , then they must satisfy all of the constraints of (3.16), and
moreover the resulting value of (3.16) corresponds to f(S). Thus, for every feasible solution
of (3.14) there is a corresponding feasible solution to (3.16) of equal value. However, it is not
clear that the converse must hold. Since the constraint (3.15) is strictly weaker than the r
constraints in (3.14), the program (3.16) could have solutions of smaller value than (3.14). We
now show that this is not the case, and that the converse must, in fact, hold.
Theorem 3.11. Let wa,x,b, where a, x, b ≥ 0, and a+ x+ b ≤ r be a feasible solution of (3.16)
with value v. Then, there exists a coverage function f = (U,w,F) on S ∪ O that is a feasible
solution of (3.14) with value v.
Proof. Let S = s(1), . . . , s(r) and O = o(1), . . . , o(r) be two sets of indices.3 We define
f = (U,w,F) on S ∪ O as follows. For each a, x, b ≥ 0 satisfying a + x + b ≤ r, and each
i ∈ [r], we create an element u(a, x, b, i) ∈ U , and define w(u(a, x, b, i)) = wa,x,b/r for all i. The
collection F contains r sets Fs(i) and r sets Fo(i), constructed as follows:
Fs(i) =
r⋃j=1
u(a, x, b, j) : a+ x ≥ i+ j mod r
Fo(i) =
r⋃j=1
u(a, x, b, j) : b+ x ≥ i+ j mod r .
3We adopt the notation s(i) and o(i) instead of si and oi here to avoid an unsightly proliferation of subscriptsin the proof.
Chapter 3. Maximum Coverage 40
Then,
f(S) = w
r⋃i=1
r⋃j=1
u(a, x, b, j) : a+ x ≥ i+ j mod r
= w
⋃i=j
u(a, x, b, j) : a+ x ≥ 0
=
1
r
r∑i=j
∑a,x,b≥0a+x≥0
wa,x,b(a, x, b, j)
=∑
a,x,b≥0a+x≥0
wa,x,b = v ,
and similarly
f(O) = w
r⋃i=1
r⋃j=1
u(a, x, b, j) : b+ x ≥ i+ j mod r
= w
⋃i=j
u(a, x, b, j) : b+ x ≥ 0
=
1
r
r∑i=j
∑a,x,b≥0b+x≥0
wa,x,b(a, x, b, j)
=∑
a,x,b≥0b+x≥0
wa,x,b = 1 ,
where the last equality follows from the fact that the wa,x,b are a feasible solution for (3.16).
Finally, we show that S is a local optimum with respect to the non-oblivious function gα,f .
Specifically, we show that gα,f (S)−gα,f (S−s(i)+o(i)) ≥ 0 for each i ∈ [r]. We fix an arbitrary
i ∈ [r], and then consider how many times each element is covered by S vs. how many times it
is covered in S−s(i)+o(i). This will allow us to compute the value gα,f (S)−gα,f (S−s(i)+o(i))
Specifically, for each a, x, b ≥ 0 and j ∈ [r], we consider how many times the elements in the
set u(a, x, b, j) are covered, and then compute how many total such elements there are.
Fix a, x, b ≥ 0 and j ∈ [r] and consider the r elements in u(a, x, b, j). In S each of the r
elements u(a, x, b, j) are covered a + x times, once for each of the a + x values of i ∈ [r] such
that i+ j mod r ≤ a+ x. On the other hand, each element of u(a, x, b, j) that appears in Fs(i)
but not in Fo(i) is covered once less in S − s(i) + o(i) than it is in S. Similarly, each element of
u(a, x, b, j) that appears in Fo(i) but not in Fs(i) is covered once more in S − s(i) + o(i) than it
is in S.
We now show how many such elements of the latter two kinds there are. The set Fs(i)
Chapter 3. Maximum Coverage 41
contains the point u(a, x, b, j) for each of the a+x values of j ∈ [r] such that i+j mod r ≤ a+x).
Similarly, the set Fo(i) contains the point u(a, x, b, j) for each of the b+ x values of j ∈ [r] such
that i+ j mod r ≤ b+ x. Finally, the point u(a, x, b, j) appears in both Fs(i) and Fo(i) for each
of the x values of j ∈ [r] such that i+ j mod r ≤ x.
Thus, for each a, x, b ≥ 0 there are a elements covered once less (or a + x − 1 times total)
in S− s(i) + o(i) and b elements covered once more (or a+ x+ 1 times total) in S− s(i) + o(i).
All other elements are covered the same number of times in both S and S − s(i) + o(i). So,
g(S)− g(S − s(i) + o(i))
=∑
a,x,b≥0
rαa+xwa,x,br−∑
a,x,b≥0
(aαa+x−1 + bαa+x+1 + (r − a− b)αa+x)wa,x,br
=1
r
∑a,x,b≥0
rαa+xwa,x,b −∑
a,x,b≥0
(aαa+x−1 + bαa+x+1 + (r − a− b)αa+x)wa,x,b
≥ 0 ,
where the last inequality follows from the fact that the wa,x,b are a feasible solution for (3.16).
Thus, we have f a feasible solution for (3.14), with value f(S) = v.
The value of the optimal solution to (3.16) is therefore the same as the value of the optimal
solution to (4.18). This value corresponds to the locality ratio of MatroidCoverage when g is
given by gα,f . Furthermore, the solution attaining this value gives a tight (symmetric) instance
for the objective function gα,f . Now we consider the problem of obtaining the optimal values
α1, . . . , αr. We want to maximize the value of (3.16) over all choices of α1, . . . , αr. Although
this results in a max-min problem, linear programming duality implies that the value of the
optimal solution to (3.16) is the same as the value of the optimal solution to the dual:
maxθ,ρ
θ
s.t. 1(b+x≥1) + ρ ((a+ b)αa+x − aαa+x−1 − bαa+x+1) ≤ 1(a+x≥1)
a, x, b ≥ 0, a+ x+ b ≤ r
ρ ≥ 0
(3.17)
The dual has only 2 variables ρ and θ. The variable ρ appears only in terms of the form ραi
for 1 ≤ i ≤ r. Hence, fixing an arbitrary a value of ρ corresponds to multiplying each of the αi
by some constant. The local optimality conditions from the primal (3.16) are invariant under
this operation, and so we can set ρ = 1 without affecting the value of the solutions of (3.17).
Chapter 3. Maximum Coverage 42
Doing so, we obtain the final linear program:
maxα1,...,αr,θ
θ
s.t. θ · 1(b+x≥1) + ((a+ b)αa+x − aαa+x−1 − bαax+1) ≤ 1(a+x≥1)
a, x, b ≥ 0, a+ x+ b ≤ r
(3.18)
A solution to (3.18) gives us the optimal setting for α1, . . . , αr as well as the approxima-
tion ratio θ attained by this setting. By experimentally examining the solutions for programs
corresponding increasing values of r, we deduced the general pattern:
θ = 1− 1
e[r], α1 = θ, αi+1 = (i+ 1)αi − iαi−1 +
1
e[r]
As r →∞ the value 1e[r]
approaches 1e and these equations give the equations stated in (3.2). In
fact, if g is defined using the value 1e[r]
in place of 1e (as above), then the analysis of Section 3.3
can be modified to show that the resulting algorithm attains an approximation ratio of 1− 1e[r]
on instances of rank r. However, this requires that the coefficients used to define g depend on
the rank of the given instance.
Chapter 4
Monotone Submodular
Maximization
In this chapter, we extend the algorithmic techniques of Chapter 3 beyond the context of
coverage functions to the general problem of maximizing any monotone submodular function
subject to a single matroid constraint. We present an optimal (1−1/e)-approximation algorithm
for monotone submodular optimization over a matroid constraint. Unlike the continuous greedy
algorithm described in Section 2.5, our algorithm is combinatorial, in the sense that it considers
only integral solutions and hence does not require any rounding procedures. Furthermore, our
algorithm is extremely simple to state—it simply runs the standard greedy algorithm followed
by 1-local search. Both phases are run not on the actual objective function, but on a related
non-oblivious potential function, which is also monotone submodular.
Our approach generalizes to the case where the monotone submodular function has restricted
total curvature c. For any c ∈ (0, 1], we give an algorithm to produce a 1−e−cc -approximation.
This result matches that of Vondrak [95], who showed that the continuous greedy algorithm
produces a 1−e−cc -approximation when the objective function has total curvature c. Vondrak
also showed that this result is tight when f is given by a value oracle, in the sense that no better
approximation can be attained with a polynomial number of queries. Unlike the continuous
greedy algorithm, our algorithm requires knowledge of the value of c. By enumerating over
a sufficiently large set of values, we can obtain a(
1−e−cc − ε
)-approximation even when c is
unknown.
Compared to the continuous greedy algorithm, our algorithm’s runtime has improved de-
pendence on the size n of the ground set, at the expense of increased dependence on the rank
r of the given matroid. In the general case in which f has unrestricted curvature, we ob-
tain a(1− 1
e − ε)-approximation in time O(ε−3r4n) and a clean
(1− 1
e
)-approximation in time
O(r7n2), where n is the size of the ground set and r is the rank of the given matroid. Cali-
nescu et al. state that the continuous greedy algorithm runs in time O(n8) but that this can
be improved by careful analysis. A more detailed analysis of the continuous greedy algorithm,
43
Chapter 4. Monotone Submodular Maximization 44
including a variant in which swap rounding is used instead of pipage rounding, appears in [41].
There it is shown that the continuous greedy algorithm using swap rounding can be imple-
mented in time O(n6), although it may be possible to employ swap rounding at each stage of
the continuous algorithm to further decrease the dependence on n.
4.1 A Non-Oblivious Local Search Algorithm
In Chapter 3 we developed a non-oblivious local search for the special case in which the mono-
tone submodular function f was given as a coverage function. There, our potential function
gave extra weight to elements covered multiple times. The main challenge in extending that
approach to the general monotone submodular case is that we must find a way to define the
non-oblivious potential function without reference to a coverage representation. Specifically, we
must construct a variant of the potential function that doesn’t refer to elements.
Our purpose for giving extra weight to elements covered several times was to cause the local
search algorithm to prefer solutions that were “robust,” in the sense that they would retain high
objective values even if some sets were removed in future iterations. Here, we accomplish this
goal in a more direct fashion. We consider the average weight of all solutions obtained from the
current solution by deleting 1, 2, . . . elements. Our potential function aggregates the information
obtained by applying the objective function to all subsets of the input, weighted according to
their size. Intuitively, the resulting potential function gives extra weight to solutions that
contain a large number of good sub-solutions, or equivalently, remain good solutions on average
even when elements are randomly removed. Here, the weight that we give to the average
value of all solutions obtained from S by deleting i elements corresponds roughly to the extra
weight given to elements covered i+ 1 times in the coverage case, in the sense that both govern
how much value we place on the robustness of the solution to the removal of i elements. We
formalize this intuition further in Section 4.3, where we show that the definition of the non-
oblivious potential function from this section coincides with that from Section 4.4 when f is a
coverage function.
With this approach in mind, we define g(S) to be a combination of the values f(T ) for all
T ⊆ S. The extra weight that a subset T contributes will depend on both its size and the size
of the solution S on which we are evaluating g. Our function g has the general form
g(S) =
|S|∑k=0
∑T∈(Sk)
β(|S|)k(|S|k
) f(T ) =
|S|∑k=1
β(|S|)k E
T∈(Sk)[f(T )] . (4.1)
Here,β(|S|)k
(|S|k )≥ 0 is the weight given to the value f(T ) on any particular subset T of size k,
so β(|S|)k is the weight given to the average value of f(T ) over all subsets T of size k.
Alternatively, we can think of g in terms of the following random process. First, we choose
a value k between 1 and |S| with probability proportional to β(|S|)k . Then, we choose k items
Chapter 4. Monotone Submodular Maximization 45
randomly from S to obtain a subset T . The value of g is then precisely the expected value of
f on the resulting random subset T , scaled by a multiplicative constant.
Note that the coefficients β depend on the size of |S|—that is, we have a separate sequence
of coefficients β(m)0 , . . . , β
(m)m for each value of m (where m corresponds to the size of the set S
on which g is being evaluated). This turns out to be necessary in order to obtain a function g
that is monotone submodular, as well as to relate g to the potential function defined in Section
3.2, which we do in Section 4.3.
Finally, we note that evaluating g(S) requires evaluating f(S) on all subsets of S. Thus,
exactly evaluating g requires an exponential number of calls to the value oracle for f . We shall
show in Section 4.2.3 that g(S) can in fact be estimated efficiently using random sampling.
In order to complete our definition of the non-oblivious local search algorithm, we must
“only” specify appropriate values for these coefficients γ(m)` . We now turn to this task.
First, we note that because f(∅) = 0, the value of g does not depend on the coefficients
β(m)0 . We adopt the convention that β
(m)0 = 0 for all m. For ` > 0 we define βm` in terms of
the auxiliary values γ(m)` defined as:
γ(m)` = `β
(m)` , for ` > 0 .
These coefficients γ(m)` , and hence also the function g, depend on the value of the parameter
c giving the total curvature of f . Thus, we obtain a different potential function gc for each
value of c. To avoid further cluttering our notation, we will assume that c has been specified
at this point, and leave the dependence implicit. For technical reasons we shall assume that
c 6= 0, and so c ∈ (0, 1]. Note that the case c = 0 corresponds to the linear case, for which the
greedy algorithm is already optimal.
We now define the values γ(m)` for 1 ≤ ` ≤ m. Our definition is recursive and makes use
of two auxiliary terms γ(m)0 and γ
(m)m+1 used only in the analysis. Specifically, we define the γ
terms by the initial conditions
γ(m)0 = 1, γ
(m)m+1 = ec for all m ≥ 0 (γ-base)
and the upward recurrence
γ(m)` = c−1m
(γ
(m−1)` − γ(m−1)
`−1
)for m ≥ 1, and 1 ≤ ` ≤ m. (γ-up)
Note that for all m we have γ(m)0 = 1 but β
(m)0 = 0, while for all m and 0 < ` < m, we use the
definition γ(m)` = `β
(m)` , which specifies β
(m)` .
In Section 4.3 we give an account of how these coefficients were obtained and expand on
their relation to the potential function for the coverage case. For now we shall simply state our
non-oblivious local search algorithm, shown in Algorithm 7, and in the following section prove
that it is a (1 − 1/e − ε)-approximation. It is very similar to the algorithm MatroidCoverage
Chapter 4. Monotone Submodular Maximization 46
given in Algorithm 6 in Section 3.2, except it uses the new potential function g that we have
just defined. In addition to the approximation parameter ε, the algorithm is parameterized
by c ∈ (0, 1], which is an upper bound on total curvature of function f . Another difference
between the algorithms is that the initial greedy phase of MatroidSubmodular is performed on
the objective function f , while MatroidCoverage uses the non-oblivious potential function g.
This change will simplify the analysis of the algorithm, increasing its runtime by a factor of
only logHr = O(log log r). In Section 4.4, we briefly discuss how this extra cost can be removed,
by sketching the analysis in the case that the initial greedy phase is executed on g.
Algorithm 7: MatroidSubmodular
Input: Approximation parameter ε > 0Curvature bound c ∈ (0, 1]Monotone submodular function f with total curvature at mostc, as a value oracleMatroid M = (X, I), as an independence oracle for I
Let g be the potential function given by (4.1), (γ-base), and (γ-up) for the given valueof c.;Let Sinit be the result of running SubmodularGreedy on f , M;
Set r = |Sinit| and ε1 = ε7rHr
(1−e−cc − ε
)−1;
S ← Sinit;repeat
foreach a ⊆ X \ S doforeach b ∈ S do
if S − b+ a ∈ I and g(S − b+ a) > (1 + ε1)g(S) thenS ← S − b+ a;break;
until no exchange is applied to S;return S;
4.2 Analysis of the Algorithm
We now analyze the runtime and approximation performance of MatroidSubmodular. While
the algorithm is simple to state, its analysis is quite involved. We give the necessary technical
proofs in the following subsections, and provide a high-level overview here.
As in the case of maximum coverage, we examine an arbitrary instance consisting of a
monotone submodular function f with total curvature at most c and a matroid M = (X, I).
Let ε > 0, and suppose that S is an ε-approximate solution and O is a globally optimal solution
for this instance. Because f is monotone, we can assume without loss of generality that O
is a base of M. We can further assume that the initial greedy solution of f computed by
MatroidSubmodular is a base ofM. Furthermore, each improvement step in MatroidSubmodular
Chapter 4. Monotone Submodular Maximization 47
maintains the cardinality of the solution. Thus, the final solution S produced by the Matroid-
Submodular must also be a base of M. From Theorem 2.13, there exists a bijection π : O → S
such that S − π(i) + i is a base of M for all i ∈ O. Let O = o1, . . . , or and then number
the elements of S as s1, . . . , sr so that si = π(oi) for all i. Furthermore, for any element
x ∈ S∩O, we have x = si = oi for some i. Then, we consider only the exchange operations that
remove π(oi) = si from S and add oi to the result. Each such operation is considered by the
algorithm at each iteration, and each one of them results in an independent set in M. Thus,
ε-local optimality implies that we have
(1 + ε)g(S) ≥ g(S − si + oi)
for each i ∈ [r]. Summing over all r such inequalities we obtain the inequality
r(1 + ε)g(S) ≥r∑i=1
g(S − si + oi). (4.2)
Now, we must relate (4.2), which is stated in terms of g, to the objective function f . In
order to do this, we introduce a symmetric notation, similar to that utilized in Chapter 3. For
a set of indices I ⊆ [r], we denote by SI the set si ∈ S : i ∈ I of all elements of S with
indices in I and similarly, we define OI = oi ∈ O : i ∈ I. Let a, x, b be non-negative integers
satisfying a+ x+ b ≤ r. Then, we define Xa,x,b to be a collection of sets containing1 (SA ]OB)
for all distinct (A,B), satisfying |A| = a + x, |B| = b + x, |A ∩ B| = x. That is, Xa,x,b is the
collection of all sets containing a+ x elements from S, and b+ x elements from O, where x of
the elements have the same index in both S and O. Then,
|Xa,x,b| =(r
a
)(r − ab
)(r − a− b
x
).
We define Fa,x,b to be the expected value of a uniformly random set in Xa,x,b:
Fa,x,b =1
|Xa,x,b|∑
T∈Xa,x,b
f(T ).
To simplify expressions, we adopt the convention that Fa,x,b = 0 if one of a, x, b is negative.
We now can express all of the quantities that we are interested in using the symmetric
1Here, we use SA ] OB to denote the disjoint, or tagged, union of SA and OB . That is, for any elementsi = oi ∈ S ∩ O, the disjoint union has distinct sets in it corresponding to the case in which the element waschosen as an element of S or as an element of O. This greatly simplifies our analysis, as it allows us to effectivelyassume that S and O are disjoint.
Chapter 4. Monotone Submodular Maximization 48
variables Fa,x,b
f(S) = Fr,0,0 (4.3)
f(O) = F0,0,r (4.4)
g(S) =
r∑k=0
β(r)k Fk,0,0 (4.5)
r∑i=1
g(S − si + oi) =r∑
k=0
[(r − k)β
(r)k Fk,0,0 + kβ
(r)k Fk−1,0,1
]. (4.6)
Equations (4.3), (4.4), and (4.5) follow immediately from the definition of the symmetric vari-
ables. We now show that (4.6) is valid. We shall show that for each set A ⊆ X, f(A) is appears
with the same total weight on the right and left side of (4.6). Expanding the definition of g on
the left, we obtainr∑i=1
r∑k=0
β(r)k(rk
) ∑T∈(S−si+oik )
f(T )
We note that the only sets that appear in (4.6) with non-zero weight are of the form SI or
SJ + oi, where I, J ⊆ [r]. We consider each of these types of sets in turn.
First, consider an arbitrary set SI , where |I| = k for some 1 ≤ k ≤ r. Then, SI is in the
collection Xk,0,0 and not in any other collection in (4.6). Hence, f(SI) appears on the right
hand side of (4.6) with total weight
(r − k)β
(r)k
|Xk,0,0|= (r − k)
β(r)k(rk
) .
For the left hand side, we note that SI appears as a subset T of S− si + oi for each value i 6∈ I.
Thus, it appears in the left hand side of (4.6) with total weight (r − k)β(r)k
(rk).
Next, consider an arbitrary set SJ + o` where |J | = k − 1 for some 1 ≤ k ≤ r and ` 6∈ J .
Then, SJ + o` appears in the collection Xk−1,0,1 and not in any other collection in (4.6). It
appears on the right hand side of (4.6) with total weight
kβ
(r)k
|Xk−1,0,1|= k
β(r)k(
rk−1
)(r−k+1
1
) =β
(r)k
r−k+1k
(r
k−1
) =β
(r)k(rk
) .
For the left hand side, we note that SJ + o` appears as a subset of S − si + oi, only for the
single value i = `. Thus, it appears on the left hand side of (4.6) with total weightβ(r)k
(rk).
Reformulating (4.2), which follows from the ε-approximate local optimality of g, in terms
of our symmetric notation, we obtain
(rε)g(S) + rr∑
k=0
β(r)k Fk,0,0 ≥
r∑k=0
[(r − k)β
(r)k Fk,0,0 + kβ
(r)k Fk−1,0,1
].
Chapter 4. Monotone Submodular Maximization 49
We rearrange this to obtain
(rε)g(S) +r∑
k=0
kβ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0 . (4.7)
For k > 1, we have kβ(r)k = γ
(r)k , and in the case k = 0 we have both F0,0,0 = 0 and F−1,0,1 = 0,
so β(r)0 (F0,0,0 − F−1,0,1) = γ
(r)0 (F0,0,0 − F0−1,0,1). Thus, (4.7) is equivalent to
(rε)g(S) +
r∑k=0
γ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0 . (4.8)
We now turn to the task of translating (4.8) into a statement about the locality ratio of
MatroidSubmodular. Our proof will use the submodularity of f only in the form of the following
general inequality, stated in our symmetric notation.
Lemma 4.1. For ` satisfying 0 ≤ ` ≤ r,
(r − `)(F`,0,1 − F`,0,0) + `(F`−1,0,1 − F`−1,0,0) + cF`,0,0 ≥ f(O).
Proof. In order to prove Lemma 4.1, we first prove three smaller inequalities. Consider a set
SL, where |L| = `. From Theorem 2.6, we have∑i∈[r]\L
[f(SL + oi)− f(SL)] ≥ f(SL ∪O[r]\L)− f(SL) . (4.9)
Now, suppose that M = i ∈ L : si = oi. Then, we have
∑i∈M
[f(SL − si + oi)− f(SL − si)] =∑i∈M
[f(SL)− f(SL − si)]
≥∑i∈M
(1− c)f(si) ≥ (1− c)f(SM ) , (4.10)
where the first inequality follows from the fact that f has total curvature at most c, and the
second from Theorem 2.6.
Furthermore,
∑i∈L\M
[f(SL − si + oi)− f(SL − si)] ≥∑
i∈L\M
[f(SL + oi)− f(SL)]
≥ f(SL ∪OL\M )− f(SL) = f(SL ∪OL)− f(SL) , (4.11)
where the first inequality follows from the decreasing marginals characterization of submodu-
larity, the second from Theorem 2.6, and the final equality from si = oi for all i ∈M ⊆ L.
Chapter 4. Monotone Submodular Maximization 50
Adding (4.9), (4.10), and (4.11), we obtain:∑i 6∈L
[f(SL + oi)− f(SL)] +∑i∈L
[f(SL − si + oi)− f(SL − si)]
≥ f(SL ∪O[r]\L) + f(SL ∪OL) + (1− c)f(SM )− 2f(SL)
≥ f(SL ∪O) + f(SL) + (1− c)f(SM )− 2f(SL) by submodularity of f
≥ f(O) + (1− c)f(SL\M ) + f(SL) + (1− c)f(SM )− 2f(SL) by curvature of f
≥ f(O) + (1− c)f(SL) + f(SL)− 2f(SL) by submodularity of f
= f(O)− cf(SL) .
This inequality is valid for any particular assignment of values from [r] to the indices of S and
O. Averaging over all possible such assignments, we obtain the inequality
(r − `)(F`,0,1 − F`,0,0) + `(F`−1,0,1 − F`−1,0,0) ≥ f(O)− cF`,0,0 .
Lemma 4.1 gives us r+1 inequalities, one for each value of ` ∈ 0, . . . , r. We combine these
inequalities with (4.8) to derive an inequality relating f(S) to f(O). The following subsections
give the technical details necessary for completing the proof, which we now describe at a very
high level. In Section 4.2.2 we combine the r+1 inequalities from Lemma 4.1 to obtain a single
inequality. This is accomplished by multiplying each inequality by γ(r)`+1 − γ
(r)` and adding the
resulting inequalities. This approach requires that each of the values γ(r)`+1−γ
(r)` are non-negative
and that the sequence γ(r) = γ(r)0 , . . . , γ
(r)r+1 satisfies a particular recurrence relation, which we
use to telescope the resulting summation. In Section 4.2.1, we prove these two properties of the
γ sequences and additionally derive an upper and lower bound on g in terms of f , which we
shall use in several of our proofs. Section 4.2.2 then completes the analysis of the locality ratio
of MatroidSubmodular, beginning with the proof of Lemma 4.8. Our proofs assume that g can
be computed exactly. In Section 4.2.3 we remove this assumption and turn to the problem of
estimating g efficiently. We give a random sampling procedure for estimating g that requires
only a polynomial number of samples. In Lemmas 4.10 and 4.11 we bound the error introduced
into our analysis by the use of our sampled estimates for g. Section 4.2.4 completes the analysis
of MatroidSubmodular. Our main results are presented in Theorems 4.12–4.14.
4.2.1 Properties of the Sequences γ
In this section, we prove three properties of the γ sequences. Specifically, we show that each
sequence γ(m) = γ(m)0 , γ
(m)1 , . . . , γ
(m)m+1 satisfies a particular recurrence (Lemma 4.2) and is non-
decreasing (Lemma 4.6). These facts will be used to combine the inequalities from Lemma 4.1
into a single inequality in Lemma 4.8. We also derive an expression for the value∑m
k=1 β(m)k
(Lemma 4.7) that will be used in Lemma 4.9, where we bound the locality ratio of Matroid-
Submodular, as well as in Section 4.2.3, where we give a sampling procedure used to compute
Chapter 4. Monotone Submodular Maximization 51
g.
Lemma 4.2. For each m ≥ 1, the sequence γ(m) satisfies the recurrence:
`γ(m)`+1 = (2`−m+ c− 1)γ
(m)` + (m− `+ 1)γ
(m)`−1, 1 ≤ ` ≤ m (γ-rec)
Proof. We proceed by induction on m. In the case that m = 1, we must only show that (γ-rec)
is valid for ` = 1. In this case, (γ-rec) becomes
γ(1)2 = cγ
(1)1 + γ
(m)0 .
By definition, we have γ(1)0 = 1, γ
(1)2 = ec, and γ
(1)1 = c−1(ec − 1), so this is equivalent to
ec = c · c−1(ec − 1) + 1 ,
and so (γ-rec) holds.
In the general case that m > 1 we consider 3 subcases based on the value of `. When ` = 1,
we have
γ(m)2 = c−1m(γ
(m−1)2 − γ(m−1)
1 ) by (γ-up)
= c−1m[(2−m+ c)γ
(m−1)1 + (m− 1)γ
(m−1)0 − γ(m−1)
1
]by induct. hyp. with ` = 1
= c−1m[(1−m+ c)γ
(m−1)1 − (1−m+ c)γ
(m−1)0 + cγ
(m−1)0
]algebra
= (1−m+ c)γ(m)1 +mγ
(m−1)0 by (γ-up)
= (1−m+ c)γ(m)1 +mγ
(m)0 by (γ-base) .
When ` = m, we have
mγ(m)m+1 = mγ(m−1)
m by (γ-base)
= c−1m[(m− 1 + c)γ(m−1)
m − (m− 1)γ(m−1)m
]algebra
= c−1m[(m− 1 + c)γ(m−1)
m − (m− 2 + c)γ(m−1)m−1 − γ(m−1)
m−2
]by induct. hyp.
= c−1m[(m− 1 + c)(γ(m−1)
m − γ(m−1)m−1 ) + γ
(m−1)m−1 − γ(m−1)
m−2
]algebra
= (m− 1 + c)γ(m)m + γ
(m)m−1 by (γ-up) .
Chapter 4. Monotone Submodular Maximization 52
When 2 ≤ ` ≤ m− 1, we have
`γ(m)`+1 = c−1m(`γ
(m−1)`+1 − `γ(m−1)
` ) by (γ-up)
= c−1m[(2`−m+ c)γ
(m−1)` + (m− `)γ(m−1)
`−1 − `γ(m−1)`
]by induct. hyp.
= c−1m[(2`−m+ c− 1)γ
(m−1)` + (m− `)γ(m−1)
`−1 − (`− 1)γ(m−1)`
]algebra
= c−1m[(2`−m+ c− 1)γ
(m−1)` + (m− `)γ(m−1)
`−1
−(2`−m+ c− 2)γ(m−1)`−1 − (m− `+ 1)γ
(m−1)`−2
] by induct. hyp.
= c−1m[(2`−m+ c− 1)γ
(m−1)` + (m− `+ 1)γ
(m−1)`−1
−(2`−m+ c− 1)γ(m−1)`−1 − (m− `+ 1)γ
(m−1)`−2
] algebra
= (2`−m+ c− 1)γ(m)` + (m− `+ 1)γ
(m)`−1 by (γ-up) .
In order to prove our second lemma, which shows that the sequence γ(m) = γ(m)0 , . . . , γ
(m)m+1
is increasing, we make use of the theory of Pade approximants. This theory is quite extensive
(a comprehensive overview is given by Baker and Graves-Morris [10]) but we only need a few
fundamental results from the area, all of which appear in Pade’s thesis [78]. We give several
more accessible references for each result, as well. First, we give the basic definition of a Pade
approximant.
Definition 4.3 (Pade Approximant). Suppose that a function F is given by the power series:
F (x) =∞∑i=0
cixi
Then, for µ, ν ∈ N, the [µ/ν] Pade approximant to F is the rational function denoted by F[µ/ν]
of the form
F[µ/ν](x) =a0 + a1x+ . . .+ aµx
µ
1 + b1x+ . . .+ bνxν
whose Maclaurin expansion agrees with F for the first µ+ ν terms.
We are primarily concerned with the Pade approximants of the function ex. This was the
among the first functions whose approximation were rigorously studied in the context of Pade
approximants. Indeed, a full treatment of the function ex appears in Pade’s thesis [78, Part
2, Section 4]. A more recent derivation of the approximants to ex and their properties can be
found in Baker and Graves-Morris [10, Sections 1.2 and 10.3]. A concise statement of all the
properties we require appears in Underhill and Wragg [92].
The [µ/ν] Pade approximant to the function ex is given by R[µ/ν](x) =P[µ/ν](x)
Q[µ/ν](x) , whose
Chapter 4. Monotone Submodular Maximization 53
numerator and denominator are given by the polynomials:
P[µ/ν](x) =
µ∑k=0
xk(µ+ ν − k)!µ!
k!(µ+ ν)!(µ− k)!, Q[µ/ν](x) =
ν∑k=0
(−x)k(µ+ ν − k)!ν!
k!(µ+ ν)!(ν − k)!.
The following known2 formula gives the error in the approximant R[µ/ν].
exQ[µ/ν](x)− P[µ/ν](x) = (−1)νxµ+ν+1
(µ+ ν)!
∫ 1
0exttν(1− t)µdt . (4.12)
In order to relate the γ coefficients to the theory of Pade approximants, we first give a closed
formula for them.
Lemma 4.4. For all m ≥ 1, the members of γ(m) are given by the following formula:
γ(m)` = (−1)`−1
m−1∑k=0
m!
k!ck−m
[(−1)k
(m− 1− k`− 1− k
)ec −
(m− 1− k`− 1
)](γ-closed)
where 1 ≤ ` ≤ m. (γ-base).
Proof. We proceed by induction on m. When m = 1, the formula gives:
γ(1)1 = (−1)0 1!
0!c−1
[(−1)0
(0
0
)ec −
(0
0
)]= c−1 (ec − 1) ,
and the defining recurrence (γ-up) and (γ-base) gives
γ(1)1 = c−1
(γ
(0)1 − γ(0)
0
)= c−1 (ec − 1) .
If m > 1, then, by construction , the sequence γ(m) is related to γ(m−1) by (γ-up) and, by
the induction hypothesis, γm−1` is given by (γ-closed) for all 1 ≤ ` ≤ m − 1. Thus, γ
(m)` is
2Again, a derivation of this formula appears in Pade’s thesis [78, (66)]. A modern derivation appears in Bakerand Graves-Morris[10, Section 10.3] or (using integral representations for P[µ/ν] and Q[µ/ν]) Braess [15, SectionV.7.A (5.2)].
Chapter 4. Monotone Submodular Maximization 54
given by
c−1m(γ
(m−1)` − γ(m−1)
`−1
)= c−1m(−1)`−1
m−2∑k=0
(m− 1)!
k!ck−m+1
[(−1)k
(m− 2− k`− 1− k
)ec −
(m− 2− k`− 1
)]
− c−1m(−1)`−2m−2∑k=0
(m− 1)!
k!ck−m+1
[(−1)k
(m− 2− k`− 2− k
)ec −
(m− 2− k`− 2
)]
= (−1)`−1m−2∑k=0
m!
k!ck−m
[(−1)k
[(m− 2− k`− 1− k
)+
(m− 2− k`− 2− k
)]ec
−[(m− 2− k`− 1
)+
(m− 2− k`− 2
)]]= (−1)`−1
m−2∑k=0
m!
k!ck−m
[(−1)k
(m− 1− k`− 1− k
)ec −
(m− 1− k`− 1
)].
If 2 ≤ ` ≤ m− 1, then we have both(m−1−(m−1)
(`−1−(m−1))
)= 0 and
(m−1−(m−1)`−1
)= 0, and so:
γ(m)` = (−1)`−1
m−2∑k=0
m!
k!ck−m
[(−1)k
(m− 1− k`− 1− k
)ec −
(m− 1− k`− 1
)]
= (−1)`−1m−1∑k=0
m!
k!ck−m
[(−1)k
(m− 1− k`− 1− k
)ec −
(m− 1− k`− 1
)].
It remains to show that the formula (γ-closed) holds for ` = 1 and for ` = m. In the case of
` = 1, the formula (γ-closed) gives
γ(m)1 = (−1)0
m−1∑k=0
m!
k!ck−m
[(−1)k
(m− 1− k−k
)ec −
(m− 1− k
0
)]
=m!c−mec −m−1∑k=0
m!
k!ck−m,
Chapter 4. Monotone Submodular Maximization 55
while using the recurrence (γ-up), (γ-base), and the induction hypothesis, we obtain
γ(m)1 = c−1m
(γ
(m−1)1 − γ(m−1)
0
)= c−1m
(γ
(m−1)1 − 1
)= c−1m(−1)0
m−2∑k=0
(m− 1)!
k!ck−m+1
[(−1)k
(m− 2− k−k
)ec −
(m− 2− k
0
)]− c−1m
= m!c−mec −m−2∑k=0
m!
k!ck−m − c−1m
= m!c−mec −m−1∑k=0
m!
k!ck−m .
For the case ` = m, the formula (γ-closed) gives
γ(m)m = (−1)m−1
m−1∑k=0
m!
k!ck−m
[(−1)k
(m− 1− km− 1− k
)ec −
(m− 1− km− 1
)]
= (−1)m−1
[(m−1∑k=0
m!
k!ck−m(−1)kec
)−m!c−m
],
while using the recurrence (γ-up), (γ-base), and the induction hypothesis, we obtain
γ(m)m = c−1m
(γ(m−1)m − γ(m−1)
m−1
)= c−1m
(ec − γ(m−1)
m−1
)= c−1m
[ec −m(−1)m−2
m−2∑k=0
(m− 1)!
k!ck−m+1
[(−1)k
(m− 2− km− 2− k
)ec −
(m− 2− km− 2
)]]
= c−1mec + (−1)m−1
[(m−2∑k=0
m!
k!ck−m(−1)kec
)−m!c−m
]
= (−1)m−1
[(m−1∑k=0
m!
k!ck−m(−1)kec
)−m!
].
The formula (γ-closed) reveals a surprising relationship between the coefficients γ to the
Pade approximants to the exponential function ex. The following theorem shows that we can
restate the explicit formula (γ-closed) in terms of Pade numerators and denominators.
Lemma 4.5. For all m ≥ 1 and 1 ≤ ` ≤ m:
γ(m)` = (−1)`−1m!c−m
(m− 1
`− 1
)[Q[m−`/`−1](c)e
c − P[m−`/`−1](c)].
Chapter 4. Monotone Submodular Maximization 56
Proof. From (γ-closed) we have
γ(m)` = (−1)`−1
m−1∑k=0
m!
k!ck−m
[(−1)k
(m− 1− k`− 1− k
)ec −
(m− 1− k`− 1
)]
= (−1)`−1m!c−mm−1∑k=0
ck
k!
[(−1)k
(m− 1− k`− 1− k
)ec −
(m− 1− k`− 1
)]
= (−1)`−1m!c−m
[`−1∑k=0
ck
k!(−1)k
(m− 1− k`− 1− k
)ec −
m−∑k=0
ck
k!
(m− 1− k`− 1
)]
= (−1)`−1m!c−m
[`−1∑k=0
(−c)k(m− 1− k)!
k!(`− 1− k)!(m− `)!ec −
m−∑k=0
ck(m− 1− k)!
k!(`− 1)!(m− `− k)!
]
= (−1)`−1m!c−m(m− 1)!
(`− 1)!(m− `)!
[`−1∑k=0
(−c)k(m− 1− k)!(`− 1)!
k!(m− 1)!(`− 1− k)!ec
−m−∑k=0
ck(m− 1− k)!(m− `)!k!(m− 1)!(m− `− k)!
]
= (−1)`−1m!c−m(m− 1
`− 1
)[Q[m−`/`−1](c)e
c − P[m−`/`−1](c)].
Using 4.5, we can now prove our second main result regarding the γ sequences.
Lemma 4.6. For any m ≥ 0, the sequence γ(m) = γ(m)0 , γ
(m)1 , . . . , γ
(m)m+1 is non-decreasing.
Proof. Consider γ(m)` and γ
(m)`−1 for some m ≥ 0 and 1 ≤ ` ≤ m+ 1. By definition, we then have
γ(m+1)` = c−1(m+ 1)
(γ
(m)` − γ(m)
`−1
),
and so γ(m)` ≥ γ
(m)`−1 if and only if γ
(m+1)` ≥ 0. We now show that this must be the case for all
m ≥ 1 and 1 ≤ ` ≤ m+ 1. From Lemma 4.5, we have:
γ(m+1)` = (−1)`−1(m+ 1)!c−(m+1)
(m
`− 1
)[Q[m+1−`/`−1](c)e
c − P[m+1−`/`−1](c)]
= (−1)`−1(m+ 1)!c−(m+1)
(m
`− 1
)[(−1)`−1 cm+1
(m+ 1)!
∫ 1
0ectt`−1(1− t)m+1−`dt
]=
(m
`− 1
)∫ 1
0ectt`−1(1− t)m+1−`dt ,
Where we have used the formula (4.12) for the error Q[µ/ν](x)ex − P[µ/ν](x) in the second line.
Now, we note that the integral above must be non-negative, since the integrand is non-negative
on the interval [0, 1]. Thus, γ(m+1)` ≥ 0.
Finally, we derive an upper bound for sum of each sequence β(m). This will be useful for
bounding the value of g in later theorems.
Chapter 4. Monotone Submodular Maximization 57
Lemma 4.7. For all m ≥ 0,m∑k=0
β(m)k ≤ ecHm ,
Furthermore, for any set S of size m,
f(S) ≤ g(S) ≤ ecHmf(S)
Proof. For the first claim, we note that
m∑k=0
β(m)k =
m∑k=1
γ(m)k
k≤ ec
m∑k=1
1
k= ecHm. (4.13)
Here, the first equality follows from the β(m)0 = 0, and the inequality from both Lemma 4.6,
which shows that the sequence γ(m) is non-decreasing, and (γ-base), which shows that γ(m)m+1 =
ec.
For the upper bound on g, we note that
g(S) =m∑k=0
β(m)k(mk
) ∑T∈(Sk)
f(T ) ≤m∑k=0
β(m)k f(S) ≤ ecHmf(S)
where the first inequality follows from the monotonicity of f and the final inequality from (4.13).
For the lower bound on g, we note that
g(S) =
m∑k=0
β(m)k(mk
) ∑T∈(Sk)
f(T ) ≥m∑k=0
β(m)k
k
mf(S) =
m∑k=1
γ(m)k
mf(S) ≥
m∑k=1
1
mf(S) = f(S) ,
where the first inequality follows from Theorem 2.8, and the second inequality follows from
Lemma 4.6 and (γ-base).
4.2.2 Locality Ratio
We now have the necessary machinery to continue our proof of the locality ratio of Matroid-
Submodular. Using the facts about the coefficients γ proved in Section 4.2.1, we combine the
inequalities from Lemma 4.1 into a general inequality relating f(S), f(O), and the symmetric
variables. Recall that r is the rank of the matroid (X, I) for the instance we are considering,
and |O| = |S| = r.
Lemma 4.8.
ecf(S) +r∑
k=0
γ(r)k (Fk−1,0,1 − Fk,0,0) ≥ ec − 1
cf(O)
Proof. For each ` ∈ 0, . . . , r we multiply the corresponding inequality from Lemma 4.1 by
(γ(r)`+1 − γ
(r)` ). Lemma 4.6 implies that (γ
(r)`+1 − γ
(r)` ) > 0, so this does not change the direction
Chapter 4. Monotone Submodular Maximization 58
of any of the inequalities. We then add the r + 1 resulting inequalities, to obtain:
r∑`=0
(γ(r)`+1 − γ
(r)` ) [(r − `)(F`,0,1 − F`,0,0) + `(F`−1,0,1 − F`−1,0,0) + cF`,0,0]
≥r∑`=0
(γ(r)` − γ
(r)`−1)f(O) . (4.14)
We now consider the coefficients of all the symmetric values Fk,0,0 and Fk−1,0,1 occurring in
(4.14), where −1 ≤ k ≤ r. Note that these are the only symmetric values occurring in (4.14).
Each term Fk,0,0, for 0 ≤ k ≤ r − 1 appears in the inequality for ` = k, with coefficient
(γ(r)k+1−γ
(r)k )(k−r+c) and in the inequality for ` = k+1, with coefficient (γ
(r)k+2−γ
(r)k+1)(−k−1).
Thus, its coefficient in (4.14) is
−(k + 1)γ(r)k+2 + (2k − r + c+ 1)γ
(r)k+1 + (r − k − c)γ(r)
k
= −(k + 1)γ(r)k+2 + (2(k + 1)− r + c− 1)γ
(r)k+1 + (r − (k + 1) + 1)γ
(r)k − cγ
(r)k
= −(k + 1)γ(r)k+2 + (k + 1)γ
(r)k+2 − cγ
(r)k
= −cγ(r)k ,
where we have used the recurrence (γ-rec) from Lemma 4.2 for the second equality. It remains
to consider the terms F−1,0,0 and Fr,0,0. By definition, F−1,0,0 = 0, so its coefficient does not
affect the sum. We assign it coefficient 0, and omit it from resulting sum. The term Fr,0,0
appears only in the inequality for ` = r, with coefficient (γ(r)r+1 − γ
(r)r )c.
Similarly, each term Fk−1,0,1 for 1 ≤ k ≤ r appears in the inequality for ` = k with coefficient
(γ(r)k+1−γ
(r)k )k and in the inequality for ` = k− 1 with coefficient (γ
(r)k −γ
(r)k−1)(r−k+ 1). Thus,
its coefficient in (4.14) is
kγ(r)k+1 − (2k − r − 1)γ
(r)k − (r − k + 1)γ
(r)k−1
= kγ(r)k+1 − (2k − r + c− 1)γ
(r)k − (r − k + 1)γ
(r)k−1 + cγ
(r)k
= kγ(r)k+1 − kγ
(r)k+1 + cγ
(r)k
= cγ(r)k
where again we have used the recurrence (γ-rec) from Lemma 4.2 for the second equality. It
remains to consider the terms F−1,0,1 and Fr,0,1. By definition, both of these terms are 0, and
so their coefficients do not affect the sum. For convenience, we assign F−1,0,1 coefficient cγ(r)k .
We assign Fr,0,1 coefficient 0 and omit it from the sum.
Chapter 4. Monotone Submodular Maximization 59
Considering all the coefficients, we find that the left side of inequality (4.14) is
c(γ(r)r+1 − γ
(r)r )Fr,0,0 −
r−1∑k=0
c γ(r)k Fk,0,0 +
r∑k=0
c γ(r)k Fk−1,0,0
= cγ(r)r+1Fr,0,0 + c
r∑k=0
γ(r)k (Fk−1,0,1 − Fk,0,0)
= c ecf(S) + cr∑
k=0
γ(r)k (Fk−1,0,1 − Fk,0,0) ,
where we have used (4.3) and (γ-base) in the final line. The right side of (4.14) is
r∑k=0
(γ(r)k+1 − γ
(r)k )f(O) = γ
(r)r+1 − γ
(r)0 f(O) = (ec − 1)f(O) ,
where we have used (γ-base) in the final equality. Thus, (4.14), is equivalent to
ecf(S) +r∑
k=0
γ(r)k (Fk−1,0,1 − Fk,0,0) ≥ ec − 1
cf(O) .
Using the inequality from Lemma 4.8, we now prove this section’s main result, bounding
the locality ratio of MatroidSubmodular.
Theorem 4.9. Suppose O is a global optimum of f and that S is an ε-approximate local
optimum for g. Then,
(1 + ε rHr)f(S) ≥ 1− e−c
cf(O) ,
Proof. We recall the inequality (4.8), which follows from the ε-approximate local optimality of
S:
(rε)g(S) +
r∑k=0
γ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0 .
Applying the upper bound on g(S) from Lemma 4.7 to the first term, we obtain
rε ecHrf(S) +r∑
k=1
γ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0 . (4.15)
Adding (4.15) to the inequality from Lemma 4.8, we obtain
rε ecHrf(S) + ecf(S) ≥ ec − 1
cf(O) .
Multiplying by e−c then completes the proof.
Chapter 4. Monotone Submodular Maximization 60
4.2.3 Computing g
In all of our analysis thus far, we have supposed that we could compute g exactly, even though
this required an exponential number of calls to the value oracle for f . Now we address the
issue of computing g(S) efficiently. When we first introduced g in Section 4.1, we gave an
interpretation of the value g(S) in terms of a random process. We now elaborate further on
this interpretation to give a randomized sampling scheme for computing g(S).
Let m = |S| and define the normalization constant
Bm =
m∑i=0
β(m)i .
From Lemma 4.7 in Section 4.2.1, we have
Bm ≤ ecHm . (4.16)
We define a random set X using the following two step experiment. First, let L be a random
variable taking value k with probability β(m)k /Bm, for 1 ≤ k ≤ m. Then let X be a uniformly
random subset of S of size L. From the linearity of expectation and the definition of g (4.1)
we have Bm E[f(X)] = g(S). We now show, via standard concentration bounds, that this
expectation can be estimated well by sampling from the distribution defining X.
Lemma 4.10. Let S be a set of size m. Let N be a positive integer, and X1, . . . , XN be N
i.i.d. random samples drawn from the distribution for X. Define g = Bm1N
∑Ni=1 f(Xi). Then,
for any ε > 0,
Pr[|g(S)− g(S)| > εg(S)] ≤ 2 exp
(− ε2N
e2cH2m
).
Proof. Moreover, because f is monotone and normalized, we have 0 ≤ Bmf(Xi) ≤ Bmf(S) for
every sample Xi. Hoeffding’s bound then gives that Pr[|g(S)− g(S)| > εg(S)] is at most:
2 exp
(−2ε2g(S)2N
B2mf(S)2
)≤ 2 exp
(−2ε2N
B2m
)≤ 2 exp
(− 2ε2N
(ecHm)2
),
where the first inequality follows from bound f(S) ≤ g(S) from Lemma 4.7 and the final
inequality from (4.16).
The next lemma allows us to relate the error introduced by estimating g to our analysis of
the locality ratio of MatroidSubmodular.
Lemma 4.11. Let 0 < ε ≤ 1/2 and suppose that |g(A)− g(A)| ≤ εg(A), |g(B)− g(B)| ≤ εg(B)
and (1 + ε)g(A) ≥ g(B). Then, (1 + 7ε)g(A) ≥ g(B).
Proof. The premises imply that
(1 + ε)2g(A) ≥ (1 + ε)g(A) ≥ g(B) ≥ (1− ε)g(B) .
Chapter 4. Monotone Submodular Maximization 61
Therefore,(1 + ε)2
1− εg(A) ≥ g(B) .
The lemma follows from the fact that
(1 + ε)2
1− ε= 1 +
ε(3 + ε)
1− ε≤ 1 + 7ε ,
since ε(3 + ε)/(1− ε) is increasing and ε ≤ 1/2.
4.2.4 Main Results
We now assemble the technical details from the previous sections to prove our main theorems,
which regard the runtime and approximation performance of MatroidSubmodular.
Theorem 4.12. Let 0 < ε ≤ 1 and c ∈ (0, 1], and set
G = Cε−11 rn log
(3
2ecHr
), N = ε−2
1 e2cH2r lnG ,
for an appropriate universal constant C. Suppose that we compute g in MatroidSubmodular
by using the procedure described in Section 4.2.3 with N samples. Then, with probability 1 −o(1), Algorithm MatroidSubmodular is a
(1−e−cc − ε
)-approximation algorithm, running in time
O(ε−3r4n).
Proof. Then we note that3 ε1 < 1/2.
Let g(A) denote the value obtained when estimating g(A) by using N samples, as in Section
4.2.3. We analyze the algorithm under the assumption that every value g(A) estimated by the
algorithm satisfies
(1− ε1)g(A) ≤ g(A) ≤ (1 + ε1)g(A) .
We will then show that this happens with probability 1− o(1).
As usual, we fix an instance (f,M = (X, I)) and let O be the optimal solution for this
instance and S be the solution produced by MatroidSubmodular. Then, S must be an ε1-
approximate local optimum of g. Lemma 4.11 shows that S is a 7ε1-approximate local optimum
of g. Theorem 4.9 then implies that
(1 + 7ε1rHr)f(S) ≥ 1− e−c
cf(O) .
Expanding the definition
ε1 =ε
7rHr
(1− e−c
c− ε)−1
,
3This bound is intentionally loose, to simplify the proof.
Chapter 4. Monotone Submodular Maximization 62
we obtain
f(S) ≥(
1− e−c
c− ε)f(O) ,
as required.
We now bound the number of improvements that the algorithm makes. Consider the
solution Sinit produced by SubmodularGreedy when applied to the instance (f,M), and let
Of = arg maxA∈I f(A) and Og = arg maxA∈I g(A). Then,
f(Sinit) ≥ 1
2f(Of ) ≥ 1
2f(Og) .
The first inequality follows from the fact that SubmodularGreedy is a 1/2-approximation for
maximizing a monotone submodular function subject to a matroid constraint. Applying the
upper and lower bounds on g from Lemma 4.7 to this inequality, we find
Fg(Sinit) ≥ f(Sinit) ≥ 1
2f(Og) ≥
1
2ecHrg(Og) (4.17)
Every time the algorithm applies an improvement, it must improve g(S) by at least a factor of
(1 + ε1). Thus, the number of improvements MatroidSubmodular can make is at most
log1+ε1
g(Og)
g(S)≤ log1+ε1
1 + ε11− ε1
ecHr
2≤ log1+ε1
3ecHr
2≤ Cε−1
1 log
(3
2ecHr
),
for some universal constant C. Here, the first inequality follows from our assumption about
evaluations of g and (4.17), and the second from the fact that 1+ε11−ε1 is increasing and ε1 < 1/2.
Now, we show that with probability 1− o(1) we do indeed have |g(A)− g(A)| ≤ ε1g(A) for
all sets considered by the algorithm. Each improvement step requires at most rn evaluations
of g, and the algorithm makes at most Cε−11 log
(32ecHr
)improvements. Thus, the algorithm
requires at most
G = Cε−11 rn log
(3
2ecHr
)total evaluations of g. We set:
N = ε−21 e2cH2
r lnG .
Lemma 4.10 then shows that the probability that for a given set A, |g(A)− g(A)| > ε1g(A), is
at most 2/G2. Hence the probability that no set considered by the algorithm has
|g(A)− g(A)| > ε1g(A)
is at least 1− 2/G = 1− o(1).
The final algorithm requires a total of
GN = O(ε−31 rn) = O(ε−3r4n)
Chapter 4. Monotone Submodular Maximization 63
calls to the value oracle for f and G = O(ε−1r2n log r) calls to the independence oracle for M.
Its runtime is dominated by the total number of calls to the value oracle for f .
As in the case of MatroidCoverage, we can eliminate the ε from the approximation ratio by
employing the partial enumeration technique described in Section 2.6. Our next two theorems
will use the definition
ρ(c) =1− e−c
c.
Additionally, in the following theorems, We suppose that we terminate all calls to Matroid-
Submodular after O(ε−3r4n) steps an return the current solution. Even in this case, Theorem
4.12 still guarantees that with probability 1 − o(1), any given call to MatroidSubmodular is a
(ρ(c)− ε)-approximation.
Theorem 4.13. Suppose that we set ε = 1−ρ(c)r in MatroidSubmodular. Then, with probability
1 − o(1), PartEnum(MatroidSubmodular) is a ρ(c)-approximation algorithm running in time
O(r7n2).
Proof. We recall from Section 2.6 that PartEnum(MatroidSubmodular) makes n calls to Matroid-
Submodular, one for each element in X. Moreover, the proof of Theorem 2.22 shows that as
long as the call to MatroidSubmodular for the element y = arg maxx∈O f(x) is a ρ(c) − ε-
approximation, then the approximation ratio of PartEnum(MatroidCoverage) is at least
1
r+r − 1
r(ρ(c)− ε) 1
r+r − 1
r
(ρ(c)− 1− ρ(c)
r
)=
1
r+r − 1
r
((r + 1)ρ(c)
r− 1
r
)≥ 1
r+
(r − 1)(r + 1)ρ(c)
r2− 1
r= ρ(c)
(1 +
1
r2
)≥ ρ(c) .
Theorem 4.12 shows that with probability 1−o(1) the call to MatroidSubmodular corresponding
to element y is indeed a (ρ(c)− ε)-approximation.
Each of the n calls to MatroidSubmodular require time
O(ε−3r4n) = O((1− ρ(c))−3r7n) .
Thus, the total runtime of PartEnum(MatroidSubmodular) is O((1− ρ(c))−3r7n2).
One weakness of our algorithm, when compared to the continuous greedy algorithm, is that
it requires knowledge of the curvature of the function f . In our next theorem, we show how this
requirement can be eliminated by enumerating over values of c with enough granularity. Since
the function ρ(c) is continuous, a small error in an estimate of c will translate into a small error
in the approximation ratio. The resulting algorithm, CurvatureEnum, is shown in Algorithm 8.
Theorem 4.14. Consider an instance (f,M) and suppose that f has total curvature c. With
probability 1− o(1), CurvatureEnum is a (ρ(c)− ε)-approximation algorithm for (f,M) running
in time O(ε−4r4n).
Chapter 4. Monotone Submodular Maximization 64
Algorithm 8: CurvatureEnum
Input: Approximation parameter ε > 0Monotone submodular function f , as a value oracleMatroid M = (X, I), as an independence oracle for I
Let C = kε : 1 ≤ k ≤ b1/εc ∪ 1;foreach c ∈ C do
Let Sc be the result of running MatroidSubmodular on (ε/2, c,X, I);
return arg maxc∈C f(Sc);
Proof. Let S be the solution produced by CurvatureEnum on the instance (f,M) and let O be
the optimal solution for this instance. Define c0 = minx ∈ C : c ≤ x. Then, c ≤ c0 < c + ε,
and f(S) ≥ f(Sc0). Since f has curvature at most c0, Theorem 4.12 shows that f(Sc0) ≥(ρ(c0)− ε/2)f(O) with probability 1− o(1).
We now bound the rate at which ρ(c) can decrease. We have
ρ′′(c) =e−c
2c3
[ec −
(1 + c+
c
2
)].
Using the series expansion for ec, we find that the bracketed expression must be positive, and
so ρ′′(c) > 0 on (0, 1]. Hence, ρ′(c) is an increasing function on this interval and so its minimum
on (0, 1] is bounded from below by limx→∞ ρ′(x) = −1/2 . Therefore, the ratio f(S)/f(O) is
at least
ρ(c0)− ε/2 ≥ ρ(c+ ε)− ε/2 ≥ ρ(c)− ε ,
where the first inequality follows from the fact that ρ(c) is decreasing on (0, 1], and the sec-
ond from the bound ρ′(c) ≥ −1/2. It follows that CurvatureEnum is at least a (ρ(c) − ε)-
approximation algorithm.
The algorithm CurvatureEnum makes ε−1 calls to MatroidSubmodular, each taking time
O(ε−3r4n). The runtime of CurvatureEnum is thus O(ε−4r4n).
In order to completely match the generality of the continuous greedy algorithm, we would
like to combine the results of Theorems 4.13 and 4.14 to obtain a clean (1−e−c)c -approximation
without knowledge of c. Unfortunately, the runtime of our partial enumeration algorithm
in Theorem 4.13, and hence the runtime of the combined approach, depends on the value
of c. Thus, the resulting algorithm runs in pseudo-polynomial time, rather than polynomial
time. Indeed, as c becomes arbitrarily close to 0, the time required by the partial enumeration
algorithm diverges. Practically, this means that the combined approach would require that
some nontrivial, constant lower bound on the value of c be specified.
Chapter 4. Monotone Submodular Maximization 65
4.3 Obtaining the Coefficient Sequences
Here, we provide an account of how the recurrences and formulas for γ sequences were derived.
The overall approach is similar to that described in Section 3.4. That is, it makes use of a
series of linear programs, one for each value of r. For instances of rank r, a solution to the
corresponding program gives the optimal locality ratio attainable on instances of rank r when
using a function of the general form given by (4.1), as well as the values of the coefficients
β(r)0 , . . . , β
(r)r defining this function. We now show how the program for each value of r is
formulated. For simplicity, we assume that c = 1 and ε = 0. That is, we do not consider the
curvature of f and consider only exact local optima.
We following the same general line approach as in Section 3.4. For a sequence β =
β(r)0 , . . . , β
(r)r , we let gβ,f denote the function g obtained from f and β by using (4.1). Then, the
function gβ,f is defined only on sets of size r, but this will suffice for our purposes. We suppose
that we are given a sequence β = β(r)0 , . . . , β
(r)r and want to find the worst-case locality ratio of
MatroidSubmodular over all instances M, f with rank r, when gβ,f is used as the non-oblivious
potential function in the algorithm.
Let S = s1, . . . , sr and O = o1, . . . , or be two sets of size r. Following the same line of
reasoning as in Section 3.4, we can restrict ourselves to the case in which S and O are disjoint
and consider the following program.
minsubmodular f
f(S)
s.t. gβ,f (S)− gβ,f (S − si + oi) ≥ 0 1 ≤ i ≤ r
f(O) = 1
(4.18)
Now, we must somehow implement the minimization over all submodular functions f . A
naıve, direct formulation involves introducing a variable representing the value of f(A) for each
subset A of S ∪ O, then adding constraints to the resulting set of variables corresponding to
the conditions that f is monotone and submodular. The number of variables and constraints
required by this approach is exponential in r. We can do better by using the symmetric variables
Fa,x,b introduced in Section 4.2. We have shown how to express the values f(S) and f(O) in
equations (4.3) and (4.4). While we cannot express the r local optimality constraints of (4.18)
directly, we can represent their sum
r∑i=1
[gβ,f (S)− gβ,f (S − si + oi)] ≥ 0 ,
by using equations (4.5) and (4.6).
Now, we consider how to represent the condition that f is monotone submodular. We
do this by adding a set of constraints to our linear program. These constraints must also be
expressed in terms of the symmetric variables Fa,x,b. For monotonicity, we have the constraint
Chapter 4. Monotone Submodular Maximization 66
f(A + y) ≥ f(A) for every A ⊆ X and y 6∈ A. Consider some sets of indices I, J ⊆ [r], with
|I \ J | = a, |J \ I| = b, and |I ∩ J | = x. Then, we have 4 separate cases, depending on whether
y is of the form si or oi and whether i ∈ I ∪ J or not. In each case, we average over all possible
assignments of values from [r] to indices and obtain the following 4 inequalities:
Fa+1,x,b ≥ Fa,x,b y = si and i 6∈ J
Fa−1,x+1,b ≥ Fa,x,b y = si and i ∈ J
Fa,x,b+1 ≥ Fa,x,b y = oi and i 6∈ I
Fa,x+1,b−1 ≥ Fa,x,b y = oi and i ∈ I
As an example, let us give a derivation of the second inequality above, for a = 1, x = 0, b = 1.
Every set in X1,0,1 has the form Si, Oj for some pair of indices i 6= j ∈ [r]. Similarly, each
set in X0,1,1 has the form Si, Oi, Oj for some pair of indices i 6= j ∈ [r]. From monotonicity
of f , for any i 6= j ∈ [r], we have:
f(Si, Oi, Oj) ≥ f(Si, Oj) .
Averaging this inequality over all possible assignments of values from [r] to i and j satisfying
i 6= j, we obtain
F0,1,1 ≥ F1,0,1 .
This approach extends in a straightforward way to the remaining 3 inequalities, and all values
of a, x, b.
For submodularity, we have the constraint f(A+ y) + f(A+ z) ≥ f(A+ y + z) + f(A) for
every A ⊆ X and y, z 6∈ A. Again, by considering each separate case in terms of our symmetric
Chapter 4. Monotone Submodular Maximization 67
notation we obtain the following separate inequalities:
Fa,x+1,b−1 + Fa,x+1,b−1 ≥ Fa,x+2,b−2 + Fa,x,b y = si, z = sj and i ∈ j, j ∈ J
Fa+1,x,b + Fa,x+1,b−1 ≥ Fa+1,x+1,b−1 + Fa,x,b y = si, z = sj and i 6∈ J, j ∈ J
Fa+1,x,b + Fa+1,x,b ≥ Fa+2,x,b + Fa,x,b y = si, z = sj and i 6∈ J, j 6∈ J
Fa−1,x+1,b + Fa−1,x+1,b ≥ Fa−2,x+2,b + Fa,x,b y = oi, s = oj and i ∈ I, j ∈ I
Fa,x,b+1 + Fa−1,x+1,b ≥ Fa−1,x+1,b+1 + Fa,x,b y = oi, z = oj and i 6∈ I, j ∈ I
Fa,x,b+1 + Fa,x,b+1 ≥ Fa,x,b+2 + Fa,x,b y = oi, z = oj and i 6∈ I, j 6∈ I
Fa−1,x+1,b + Fa,x+1,b−1 ≥ Fa−1,x+2,b−1 + Fa,x,b y = oi, z = sj and i ∈ I, j ∈ J
Fa−1,x+1,b + Fa+1,x,b ≥ Fa,x+1,b + Fa,x,b y = oi, z = sj and i ∈ I, j 6∈ J
Fa,x,b+1 + Fa,x+1,b−1 ≥ Fa,x+1,b + Fa,x,b y = oi, z = sj and i 6∈ I, j ∈ J
Fa,x,b+1 + Fa+1,x,b ≥ Fa,x+1,b + Fa,x,b y = oi, z = sj , i 6= j and i 6∈ I, j 6∈ J
Fa,x,b+1 + Fa+1,x,b ≥ Fa+1,x,b+1 + Fa,x,b y = oi, z = sj , i = j and i 6∈ I, j 6∈ J
Chapter 4. Monotone Submodular Maximization 68
Putting everything together, we obtain the linear program:
minFa,x,b
Fr,0,0
s.t.r∑
k=0
kβ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0
Local Optimality
Constraint
F0,0,r = 1
Fa+1,x,b ≥ Fa,x,bFa−1,x+1,b ≥ Fa,x,bFa,x,b+1 ≥ Fa,x,bFa,x+1,b−1 ≥ Fa,x,b
Monotonicity
Constraints
a, x, b ≥ 0,
a+ x+ b ≤ r
Fa+1,x,b + Fa+1,x,b ≥ Fa+2,x,b + Fa,x,b
Fa+1,x,b + Fa,x+1,b−1 ≥ Fa+1,x+1,b−1 + Fa,x,b
Fa,x+1,b−1 + Fa,x+1,b−1 ≥ Fa,x+2,b−2 + Fa,x,b
Fa,x,b+1 + Fa,x,b+1 ≥ Fa,x,b+2 + Fa,x,b
Fa,x,b+1 + Fa−1,x+1,b ≥ Fa−1,x+1,b+1 + Fa,x,b
Fa−1,x+1,b + Fa−1,x+1,b ≥ Fa−2,x+2,b + Fa,x,b
Fa−1,x+1,b + Fa,x+1,b−1 ≥ Fa−1,x+2,b−1 + Fa,x,b
Fa−1,x+1,b + Fa+1,x,b ≥ Fa,x+1,b + Fa,x,b
Fa,x,b+1 + Fa,x+1,b−1 ≥ Fa,x+1,b + Fa,x,b
Fa,x,b+1 + Fa+1,x,b ≥ Fa,x+1,b + Fa,x,b
Fa,x,b+1 + Fa+1,x,b ≥ Fa+1,x,b+1 + Fa,x,b
Submodularity
Constraints
a, x, b ≥ 0,
a+ x+ b ≤ r
Fa,x,b ≥ 0
(4.19)
If the variables Fa,x,b are the symmetric variables for some underlying monotone submodular
function f , then they must satisfy all of the constraints of (4.19), and moreover Fr,0,0 = f(S).
Thus, for every feasible solution of (4.18) there is a corresponding feasible solution to (4.19) of
equal value. However, it is not clear that the converse is true, since the symmetric constraints
are, in general, weaker than the individual constraints we summed to obtain them. We now
show that the converse does, in fact, hold. The following theorem is the direct analogue of
Theorem 3.11 from Section 3.4.
Theorem 4.15. Let F be the set of values Fa,x,b, where a, x, b ≥ 0, and a+x+b ≤ r comprising
a feasible solution of (4.19), and suppose that Fr,0,0 = v, so that F has value v. Then, there
exists a monotone submodular function f on X that is a feasible solution of (4.18) with value
f(S) = v.
Proof. The proof is similar to the proof Theorem 3.11 but is easier, as here we need not give a
Chapter 4. Monotone Submodular Maximization 69
particular representation for the function f . Let S = s1, . . . , sr and O = o1, . . . , or be two
subsets of X. For some A ⊆ O ∪ S, let I(A) = i ∈ [r] : si ∈ A and J(A) = i ∈ [r] : oi ∈ Abe the indices of those elements from S and O, respectively, contained in A. Then, we define
f(A) = Fa,x,b , where a = |I(A) \ J(A)|, b = |J(A) \ I(A)|, x = |I(A) ∩ J(A)| .
It is immediate from the definition that f(S) = Fr,0,0 = v and that f(O) = F0,0,r = 1, since F
is a feasible solution of (4.19).
Now, we show that f must be monotone submodular. Let A be some subset of S ∪O, and
y some element of (S ∪ O) \ A, and suppose that |I(A) \ J(A)| = a, |J(A) \ I(A)| = b and
|I(A) ∩ J(A)| = x, so that we have f(A) = Fa,x,b. Then, there are four possible values for
f(A+ y). Specifically,
f(A+ y) = Fa+1,x,b if y = si and i 6∈ J(A)
f(A+ y) = Fa,x+1,b−1 if y = si and i ∈ J(A)
f(A+ y) = Fa,x,b+1 if y = oi and i 6∈ I(A)
f(A+ y) = Fa,x+1,b−1 if y = oi and i ∈ I(A)
These cases correspond exactly to the cases considered in the monotonicity constraints in (4.19).
Thus, we have f(A + y) ≥ f(A) = Fa,x,b in each case, since F is a feasible solution of (4.19).
Similarly, for submodularity we consider a subset A of S ∪ O and y, z ∈ (S ∪ O) \ A. We
obtain 11 possible values for f(A+ x) + f(A+ y) corresponding to the 11 cases considered in
submodularity constraints in (4.19), and so the submodularity of f also follows directly from
the feasibility of F .
Next, we consider the value gβ,f (S)− gβ,f (S − si + oi). The value gβ,f (S) is given by
gβ,f (S) =
r∑k=0
β(r)k(rk
) ∑T∈(Sk)
f(T ) .
For each set T ∈(Sk
)we have T ⊆ S and so I(T ) = k and J(T ) = 0. Thus, f(T ) = Fk,0,0 for
each of the(rk
)sets T ∈
(Sk
). Thus,
gβ,f (S) =r∑
k=0
β(r)k(rk
) ∑T∈(Sk)
Fk,0,0 =r∑
k=0
β(r)k Fk,0,0 .
Similarly, the value gβ,f (S − si + oi) is given by
gβ,f (S) =
r∑k=0
β(r)k(rk
) ∑T∈(S−si+oik )
f(T ) =
r∑k=0
β(r)k(rk
) ∑T∈(S−sik−1 )
f(T + oi) +∑
T∈(S−sik )
f(T )
. (4.20)
Chapter 4. Monotone Submodular Maximization 70
For each set T ∈(S−sik−1
)we have I(T ) = k − 1 and J(T ) = 0. Thus, f(T + oi) = Fk−1,0,1 for
each of the(r−1k−1
)= k
r
(rk
)sets T ∈
(S−sik−1
). Similarly, for each set T ∈
(S−sik
), we have I(T ) = k
and J(T ) = 0. Thus, f(T ) = Fk,0,0 for each of the(r−1k
)= r−k
r
(rk
)sets T ∈
(S−sik−1
). Thus,
gβ,f (S − si + oi) =
r∑k=0
[β
(r)k
(r − kr
Fk,0,0 +k
rFk−1,0,1
)](4.21)
Combining the expressions (4.20) and (4.21) for g(S) and g(S − si + oi) we obtain
gβ,f (S)− gβ,f (S − si + oi) =r∑
k=0
β(r)k
[Fk,0,0 −
r − kr
Fk,0,0 +k
rFk−1,0,1
]
=1
r
r∑k=0
kβ(r)k [Fk,0,0 + Fk−1,0,1] ≥ 0 ,
where the final inequality comes from the fact that F is a feasible solution of (4.19) and so
must satisfy the local optimality constraint.
We have shown that f must be monotone submodular with f(O) = 1 and gβ,f (S)−gβ,f (S−si + oi) ≥ 0 for each i ∈ [r]. Hence, f is a feasible solution for (4.18). It has value f(S) =
Fr,0,0 = v.
As in the coverage case, the value of the optimal solution to (4.19) is therefore the same
as the value of the optimal solution to (4.18). This value corresponds to the locality ratio
of MatroidSubmodular when g is given by gβ,f . Furthermore, the solution attaining this value
gives a tight instance for the objective from gβ,f . In order to obtain the optimal values of
β(r)0 , . . . , β
(r)r we follow the same approach as in Section 3.4. That is, we take the dual of the
linear program (4.19), which is a maximization program, and then maximize over the resulting
dual variables and the values β(r)0 , . . . , β
(r)r .
We do not show the dual program here, but we note that, as in the coverage case, it is linear
except for non-linear terms of the form ρFa,x,b corresponding to the local optimality constraint.
As in the coverage case, the local optimality condition in the primal program is invariant under
scaling by a constant, so we can set the dual variable ρ = 1 arbitrarily to obtain a linear
program.
The final dual program has a variable for each of the coefficients β(r)0 , . . . , β
(r)r , a variable θ
corresponding to the constraint F0,0,r = 1, and a variable corresponding to each monotonicity
and submodularity constraint. The variable θ is also the objective maximized by the dual
program, and hence the optimal locality ratio of MatroidSubmodular. We have seen that the
primal variables define a tight instance. Here, the dual variables are useful, as well: the dual
variables give a proof of the locality ratio of MatroidSubmodular is at most θ for the given
values of β(r)0 , . . . , β
(r)r . Specifically, we obtain a proof of the inequality f(S) ≥ θ = θf(O)
(recall that we have set f(O) = 1) by combining (i.e. summing) the symmetric monotonicity
and submodularity inequalities for instances of size r. The weight with which each inequality
Chapter 4. Monotone Submodular Maximization 71
is included in this combination is given by the value its corresponding dual variable.
We now give a brief account of how the programs led us to the coefficient sequences β and
γ. We solved the (dual) linear programs for several values of r and then, by examining the dual
variables for values of r = 2 through r = 10, we deduced a general proof technique for deriving
the desired inequality. The technique required only the general inequality
(r − `)(F`,0,1 − F`,0,0) + `(F`−1,0,1 − F`−1,0,0)+ ≥ f(O) ,
given in Lemma 4.1. We then observed that the sum of these r inequalities telescoped into an
inequality relating f(S) = Fr,0,0 and f(O) = F0,0,r precisely when the sequence γ(r) satisfied the
recurrence (γ-rec) (as we proved in (4.14)). Using this recurrence, we extended the sequence
γ(r) to obtain one extra value γ(r)r+1, and noted that the locality ratio of MatroidSubmodular was
then given by:
γ(r)r+1 − γ
(r)0
γ(r)r+1
.
The general technique required multiplying the inequality from Lemma 4.1 by γ(r)`+1 − γ
(r)` for
each value of `. This required that the sequence γ(r) be non-decreasing, to prevent the direction
of the inequalities from being reversed. The main difficulty was then determining a set of initial
conditions for (γ-rec) that ensured this, while giving the best possible locality ratio.
After determining the proper value for the sequence γ(r), we then derived the sequences
γ(m) for m 6= r in a fashion consistent with the requirement that g be monotone submodular,
as we prove in the next section. Finally, the effect of the curvature c of f were deduced by
using an augmented version of the linear program.
4.4 Further Properties of g
We now turn to proving various properties of the potential function g defined by (4.1), (γ-base),
and (γ-up). In this section we show that g is monotone submodular, and also show that if f is
a coverage function, then the g given by (4.1) in fact corresponds to the non-oblivious potential
function g used in Chapter 3.
Our proofs require two small identities and an ancillary lemma about g. The identities
follow from the recurrence (γ-rec), stated in Lemma 4.2.
Lemma 4.16. For all 1 ≤ ` ≤ m we have the identities:
β(m+1)`(m+1`
) +β
(m+1)`+1(m+1`+1
) =β
(m)`(m`
) (4.22)
β(m)k(mk
) +β
(m+2)k(m+2k
) = 2β
(m+1)k(m+1k
) +β
(m+2)k+2(m+2k+2
) . (4.23)
Chapter 4. Monotone Submodular Maximization 72
Proof. The sequences γ(m) and γ(m+1) are related by (γ-up), and satisfy (γ-rec) as shown in
Lemma 4.2. Multiplying (4.22) by(m`
)(m+ 1)` we obtain the equivalent identity:
(m+ 1− `)`β(m+1)` + `(`+ 1)β
(m+1)`+1 = (m+ 1)`β
(m)` .
Using the definition γ(i)k = kβ
(i)k for k > 0, this is equivalent to
(m+ 1− `)γ(m+1)` + `γ
(m+1)`+1 + (m+ 1)γ
(m)`
Now, we have
(m+ 1− `)γ(m+1)` + `γ
(m+1)`+1
= c−1(m+ 1)[(m+ 1− `)(γ(m)
` − γ(m)`−1) + `(γ
(m)`+1 − γ
(m)` )
]by (γ-up)
= c−1(m+ 1)[`γ
(m)`+1 + (m− 2`+ 1)γ
(m)` − (m− `+ 1)γ
(m)`−1
]algebra
= (m+ 1)γ(m)` , by (γ-rec)
completing the proof of identity (4.22).
We use identity (4.22) to prove (4.23). We have
2β
(m+1)k(m+1k
) +β
(m+2)k+2(m+2k+2
)=β
(m+1)k(m+1k
) +β
(m+1)k(m+1k
) +β
(m+2)k+2(m+2k+2
)=β
(m+2)k(m+2k
) +β
(m+2)k+1(m+2k+1
) +β
(m+1)k(m+1k
) +β
(m+2)k+2(m+2k+2
) by (4.22) applied to 1st term
=β
(m+2)k(m+2k
) +β
(m+1)k+1(m+1k+1
) +β
(m+1)k(m+1k
) by (4.22) applied to 2nd and 4th terms
=β
(m+2)k(m+2k
) +β
(m)k(mk
) by (4.22) appied to 2nd and 3rd terms
completing the proof of identity (4.23).
Lemma 4.17. For any S ⊆ X with |S| = m and any x ∈ X \ S,
g(S + x) =m∑k=0
∑T∈(Sk)
[β
(m+1)k(m+1k
) f(T ) +β
(m+1)k+1(m+1k+1
) f(T + x)
]
Chapter 4. Monotone Submodular Maximization 73
Proof.
g(S + x) =m+1∑k=0
β(m+1)k(m+1k
) ∑T∈(S+xk )
f(T )
=
m∑k=0
β(m+1)k(m+1k
) ∑T∈(Sk)
f(T ) +
m+1∑k=1
β(m+1)k(m+1k
) ∑T∈( S
k−1)
f(T + x) splitting the sum
=m∑k=0
β(m+1)k(m+1k
) ∑T∈(Sk)
f(T ) +m∑k=0
β(m+1)k+1(m+1k+1
) ∑T∈(Sk)
f(T + x) shifting indices
=m∑k=0
∑T∈(Sk)
[β
(m+1)k(m+1k
) f(T ) +β
(m+1)k+1(m+1k+1
) f(T + x)
].
Now, we are ready to prove that g is monotone and submodular.
Theorem 4.18. For any set S of size m ≥ 0 and x /∈ S, g(S) ≤ g(S + x). Moreover, if
f(T ) = f(T + x) for each T ⊆ S then g(S) = g(S + x).
Proof.
g(S + x) =
m∑k=0
∑T∈(Sk)
[β
(m+1)k(m+1k
) f(T ) +β
(m+1)k+1(m+1k+1
) f(T + x)
]by Lemma 4.17
≥m∑k=0
∑T∈(Sk)
[β
(m+1)k(m+1k
) f(T ) +β
(m+1)k+1(m+1k+1
) f(T )
]by monotonicity of f
=m∑k=0
∑T∈(Sk)
β(m)k(mk
) f(T ) by identity (4.22) (for k > 0)
and f(∅) = 0 (for k = 0)
= g(S) .
Note that if f(T ) = f(T + x) for all T ⊆ S then the inequality is tight.
Theorem 4.19. For any set S of size m− 1 ≥ 0 and x 6= y /∈ S, we have
g(S + x) + g(S + y) ≥ g(S + x+ y) + g(S) .
Proof.
g(S + x+ y) + g(S)
=m+2∑k=0
β(m+2)k(m+2k
) ∑T∈(S+x+yk )
f(T ) +m∑k=0
β(m)k(mk
) ∑T∈(Sk)
f(T )
Chapter 4. Monotone Submodular Maximization 74
=
m∑k=0
β(m+2)k(m+2k
) ∑T∈(Sk)
f(T ) +
m+1∑k=1
β(m+2)k(m+2k
) ∑T∈( S
k−1)
(f(T + x) + f(T + y))
+
m+2∑k=2
β(m+2)k(m+2k
) ∑T∈( S
k−2)
f(T + x+ y) +
m∑k=0
β(m)k(mk
) ∑T∈(Sk)
f(T )
splitting the sum
=m∑k=0
2β
(m+1)k(m+1k
) ∑T∈(Sk)
f(T ) +m+1∑k=1
β(m+2)k(m+2k
) ∑T∈( S
k−1)
(f(T + x) + f(T + y))
+
m+2∑k=2
β(m+2)k(m+2k
) ∑T∈( S
k−2)
f(T + x+ y) +β
(m+2)k+2(m+2k+2
) ∑T∈(Sk)
f(T )
by identity (4.23)
(for k > 0)
and f(∅) = 0
(for k = 0)
=m∑k=0
2β
(m+1)k(m+1k
) ∑T∈(Sk)
f(T ) +m∑k=0
β(m+2)k+1(m+2k+1
) ∑T∈(Sk)
(f(T + x) + f(T + y))
+m∑k=0
β(m+2)k+2(m+2k+2
) ∑T∈(Sk)
f(T + x+ y) +β
(m+2)k+2(m+2k+2
) ∑T∈(Sk)
f(T )
shifting indices
=m∑k=0
2β
(m+1)k(m+1k
) ∑T∈(Sk)
f(T ) +m∑k=0
β(m+2)k+1(m+2k+1
) ∑T∈(Sk)
(f(T + x) + f(T + y))
+m∑k=0
β(m+2)k+2(m+2k+2
) ∑T∈(Sk)
(f(T + x+ y) + f(T ))
algebra
≤m∑k=0
2β
(m+1)k(m+1k
) ∑T∈(Sk)
f(T ) +
m∑k=0
β(m+2)k+1(m+2k+1
) ∑T∈(Sk)
(f(T + x) + f(T + y))
+
m∑k=0
β(m+2)k+2(m+2k+2
) ∑T∈(Sk)
(f(T + x) + f(T + y))
by submodularity of f
=
m∑k=0
2β
(m+1)k(m+1k
) ∑T∈(Sk)
f(T ) +
m∑k=0
β(m+1)k+1(m+1k+1
) ∑T∈(Sk)
(f(T + x) + f(T + y)) by identity (4.22)
= g(S + x) + g(S + y) by Lemma 4.17
Having shown that g is monotone submodular, we now give a brief sketch of how Matroid-
Submodular can be modified to remove the extra factor of log log r from its runtime. As in
MatroidCoverage, we run the greedy initial phase of MatroidSubmodular on g instead of f .
Consider some instance (M = (X, I), f) of monotone submodular maximization subject to a
matroid constraint. Let Sinit be the solution returned by the SubmodularGreedy on (M, g), and
Chapter 4. Monotone Submodular Maximization 75
let Og = arg maxA∈I g(A). Then, because g is monotone submodular, we have the bound
g(Sinit) ≥ 1
2g(Og) .
Unfortunately, we cannot compute g exactly, but can only estimate it. Calinescu et al. [21]
and Goundan and Schulz [46] both show that the greedy algorithm remains a αα+1 -approximation
for monotone submodular maximization when we have only an α-approximate incremental
oracle for g. Lemma 4.10 shows how to obtain an approximate oracle for g by sampling. We
need to turn this into an approximate oracle for the marginal value gS(x), or modify the proof of
Goundan and Schulz to use an approximate oracle for g instead of an approximate incremental
oracle for gS . Either approach reveals that it is sufficient to take n times as many as are required
to an approximate oracle for g in Lemma 4.10. Using this result, we can obtain the bound
g(Og)
g(Sinit)= O(1)
in the proof of Theorem 4.12, improving on the bound given in (4.17). Then, the remainder
of the analysis in Theorem 4.12 shows that the resulting algorithm can make at most O(ε−1)
improvements.
Finally, we show that if f is a coverage function, then the function g given by (4.1) cor-
responds (up to a scaling factor) to the non-oblivious potential function defined in (3.1) in
Chapter 3. In this sense, MatroidSubmodular is a generalization of MatroidCoverage.
Theorem 4.20. Let f = (U,w,F) be a coverage function, and let G be the function obtained
from f by using (3.1). Let g be the function obtained from f by using (4.1) with c = 1. Then,
g(S) = e ·G(S) for all S.
Proof. First, we note that since f is a coverage function,
g(S) =
|S|∑k=0
β(|S|)k(|S|k
) ∑T∈(Sk)
f(T )
=
|S|∑k=0
β(|S|)k(|S|k
) ∑T∈(Sk)
∑x∈U
s.t. x∈F (T )
w(x)
=
|S|∑k=0
β(|S|)k(|S|k
) ∑x∈U
∑T∈(Sk)
s.t. x∈F (T )
w(x)
=∑x∈U
w(x)
|S|∑k=0
β(|S|)k(|S|k
) ∣∣∣T ∈ (Sk) : x ∈ F (T )∣∣∣ .
Chapter 4. Monotone Submodular Maximization 76
The value|S|∑k=0
β(|S|)k(|S|k
) ∣∣∣T ∈ (Sk) : x ∈ F (T )∣∣∣
depends only on |S| and #(x, S) (the number of sets in S containing x). Hence, there must
exist coefficients ζ(|S|)#(x,S) such that
g(S) =∑x∈U
ζ(|S|)#(x,S)w(x) .
We now show that the coefficients ζ do not in fact depend on |S|. We consider the following
thought experiment. We add to F the single set Fu+1 = ∅. Then, for every A ⊆ S we have
f(A∪u+ 1) = f(A), since Fu+1 covers no elements. From Theorem 4.18, we must then have
g(S ∪ u+ 1) = g(S). Hence,∑x∈U
ζ(|S|)#(x,S)w(x) = g(S) = g(S ∪ u+ 1) =
∑x∈U
ζ(|S|+1)#(x,S) w(x) ,
for every S, and so ζ(|S|+1)#(x,S) w(x) = ζ
(|S|)#(x,S) for every S. We let ζi denote the common value of
the terms ζ(m)i for all values of m. Then,
g(S) =∑x∈U
ζ#(x,S)w(x) .
In order to derive a recurrence for the ζ terms, we consider another thought experiment.
Suppose that universe contains a single element x with w(x) = 1, and that F contains some
number p of sets Fi = x for 1 ≤ i ≤ p and one set Fp+1 = ∅. Suppose that S = [p]. Then,
g(S) = ζ#(x,S)w(x) = ζp .
Furthermore, from the definition of g, we have
g(S) =
p∑k=0
β(p)k(pk
) ∑T∈(Sk)
f(T ) =
p∑k=1
β(p)k ,
since f(T ) = 1 for all T 6= ∅ and f(∅) = 0. Thus, we have:
ζp =
p∑k=1
β(p)k . (4.24)
Now, consider the solution S′ = S ∪ p + 1, which, in addition to S, contain the index of
the set ∅. We have
g(S′) = ζ#(x,S′)w(x) = ζp ,
Chapter 4. Monotone Submodular Maximization 77
and, from the definition of g
g(S′) =
p+1∑k=0
β(p+1)k(p+1k
) ∑T∈(S
′k )
f(T ) =p
p+ 1β
(p+1)1 +
p+1∑k=2
β(p+1)k
since f(∅) = 0, f(T ) = 0 for the set T = p+ 1 and 1 for the other p sets T in(S′
1
), and finally
f(T ) = 1 for every set T containing at least 2 members of S′. Thus, we have:
ζp =p
p+ 1β
(p+1)1 +
p+1∑k=2
β(p+1)k . (4.25)
Equations (4.24) and (4.25) are each valid for any value of p ≥ 1. Letting p = i + 1 in
equation (4.24) and p = i in equation (4.25) we obtain the recurrence:
ζi+1 =i+1∑k=1
β(i+1)k =
1
i+ 1β
(i+1)1 +
i
i+ 1β
(i+1)1 +
i+1∑k=2
β(i+1)k =
1
i+ 1β
(i+1)1 + ζi . (4.26)
Now, we have:
ζi+1 =1
i+ 1β
(i+1)1 + ζi by (4.25)
=1
i+ 1γ
(i+1)1 + ζi since γ
(i+1)1 = 1 · β(i+1)
1
= γ(i)1 − γ
(i)0 + ζi by applying (γ-up) to γ
(i+1)1
= γ(i)1 − 1 + ζi by applying (γ-base) to γ
(i+1)0
= i(ζi − ζi−1)− 1 + ζi by applying (4.25) to γ(i)1
= (i+ 1)ζi − iζi−1 − 1
For the base cases, (4.24) gives ζ0 = 0 and
ζ1 = β(1)1 = γ
(1)1 = γ
(0)1 − γ(0)
0 = e− 1 ,
using (γ-up) and (γ-base). Putting everything together, we have:
ζ0 = 0 ζ1 = e− 1 ζi+1 = (i+ 1)ζi − iζi−1 − 1 .
Examining the recurrence (3.2) for α, we find that ζi = e · αi for all i ≥ 0 and so
g(S) =∑x∈U
ζ#(x,S)w(x) =∑x∈U
e · α#(x,S)w(x) = e ·G(S) .
Chapter 5
Set Systems for Local Search
In the previous chapters, we have presented local search algorithms for the problem of maxi-
mizing coverage and general monotone submodular functions subject to a matroid constraint.
In this section, we formulate a general class of set systems amenable to approximation by local
search algorithms. We begin by reviewing some existing generalizations of matroids, in Section
5.1. In Section 5.2, we present a new class of independence systems called weak k-exchange
systems and show that they capture problems in which a simple local search algorithm is a 1k -
approximation for linear maximization. In Section 5.3, we strengthen this definition to obtain
the class of strong k-exchange systems, which admit more sophisticated local search techniques.
We describe these techniques in detail in Chapter 6. We relate both of these systems to the
existing hierarchy of set systems presented in Section 5.1. Finally, in Section 5.4, we show that
several combinatorial optimization problems are k-exchange systems.
5.1 Set Systems for Greedy Approximation Algorithms
Before defining our classes of independence systems for local search algorithms, we review some
of the more general independence systems related to the greedy algorithm. Matroids capture
those independence systems in which the greedy algorithm always returns an optimal solution
for any linear function on the ground set [81, 33]. A natural question is whether there is a
similar characterization of systems in which the greedy algorithm yields a 1k -approximation for
some k > 1.
In order to address this question, Jenkyns [58] and Korte and Hausmann [66, 52] indepen-
dently introduced the class of k-systems. For any independence system (X, I) and any set
E ⊆ X, we let u(E) denote the size of the maximum base of I contained in E and l(e) denote
the size of the minimum base of I contained in E (recall that a set A ⊆ E is a base of E if
A+ x 6∈ I for all x ∈ E \A). Then, k-systems are defined as follows.
Definition 5.1 (k-system [58, 66, 52]). An independence system (X, I) is a k-system, for some
78
Chapter 5. Set Systems for Local Search 79
(not necessarily integral) k ≥ 1 if:
maxE⊆X
u(E)
l(E)≤ k .
If further
maxE⊆X
u(E)
l(E)= k ,
then (X, I) is called an exact k-system.
Definition 5.1 can be viewed as a natural generalization of the matroid characterization
presented in Theorem 2.11. Specifically, in the case that k = 1, Definition 5.1 states that
l(E) = u(E) for all sets E, and so all bases of E have the same size. This is precisely the
condition given in Theorem 2.11, and so the 1-systems are precisely those independence systems
that are matroids. In fact, the relationship between matroids and k-systems extends even
further: the intersection of k matroids is a k-system [58, 52]. This allows many optimization
problems that can be formulated as matroid intersection problems to be placed in the hierarchy
of k-systems.
Jenkyns, Korte, and Hausmann [58, 66, 52] show that the standard greedy algorithm Greedy
is a 1k -approximation for the problem of maximizing a linear function in a k-system. Moreover,
for every exact k-system, there exists a weight function for which the greedy algorithm returns
a solution of value only 1k times that of the optimal. Thus, k-systems are exactly those inde-
pendence systems for which the greedy algorithm is a 1k -approximation for linear maximization.
In further work, Hausmann and Korte [51] consider the k-greedy algorithm, which at each step
chooses a set of at most k elements that give the largest improvement in the objective function.
They show that there are k-systems for which this algorithm is still only a 1k -approximation.
Fisher, Nemhauser and Wolsey [43] examine the specific problem of maximizing a mono-
tone submodular function subject to the intersection of k matroid constraints. They show
that the greedy algorithm (which we presented as SubmodularGreedy in Algorithm 3) is a 1k+1 -
approximation for this problem, and that this bound is tight (i.e. there is a monotone sub-
modular function and k matroid constraints for which the greedy solution is only 1k+1 times
the optimal solution). They remark that this proof can be extended to the general class of
k-systems. A full proof of this claim appears in Calinescu et al. [21]. Finally, Gupta et al.
[49] show that it is possible to attain a k3(k+1)2
-approximation for non-monotone submodular
maximization in a k-independence system by combining multiple runs of the standard greedy
algorithm with an algorithm for unconstrained non-monotone submodular maximization.1
Unfortunately, k-independence can be difficult to establish for a given system, and may not
correspond to any useful algorithmic intuition. The class of k-extendible systems, introduced
by Mestre [74], accomplishes many of the same goals as k-systems but are defined by a more
direct, algorithmic property.
1Here, we have simplified the approximation ratio by assuming that the algorithm of Gupta et al. uses of arecent, tight 1
2-approximation algorithm of Feldman, Naor, and Schwartz [38] for unconstrained non-monotone
submodular maximization.
Chapter 5. Set Systems for Local Search 80
Definition 5.2 (k-extendible system [74]). An independence system (X, I) is k-extendible if
for any C ⊆ D ∈ I, for all x 6∈ C such that C + x ∈ I there exists a Y ⊆ D \ C with |Y | ≤ k
such that (D \ Y ) + x ∈ I.
Intuitively, the definition states that if we can add an element x to some independent set C
in a k-extendible system, then we can add x to any superset D of C, after removing at most k
elements from D \ C.
Mestre shows that the greedy algorithm is a 1k approximation for linear maximization in
any k-extendible system. Thus, the k-extendible systems are a subset of the k-systems. Fur-
thermore, as shown by Calinescu et al. [21], the greedy algorithm is a 1k+1 -approximation for
monotone submodular maximization in a k-extendible system. There are k-systems that are
not k-extendible [21], so k-extendible systems do not provide an exact characterization of those
systems for which these positive results hold. However, in contrast to k-systems, k-extendible
systems are defined by a local combinatorial property, which is often easier to establish and
can provide additional algorithmic insight into a problem. Mestre shows that a variety of
natural combinatorial optimization problems—including maximum asymmetric traveling sales-
man, b-matching, and job interval scheduling—are k-extendible for appropriate values of k (the
definitions of many of these problems can be found in Section 5.4).
The case k = 1 deserves special attention. Mestre shows that the 1-extendible systems are
exactly the matroids, and so in the case that k = 1 Definitions 2.10, 5.1, and 5.2 all define the
same class of independence systems.
5.2 Weak k-Exchange Systems
All of the results from 5.1 pertained to the standard greedy algorithms Greedy and Submodular-
Greedy. We now ask the same questions regarding a simple, “standard” local search procedure
(a, r)-OblLocalSearch, shown in Algorithm 9. The algorithm is an oblivious local search proce-
dure. It takes an independence system (X, I) and a function f : 2X → R≥0 and searches for a
set in I maximizing f .
More formally, suppose that S ∈ I is some feasible solution, A ⊆ X \ S is some set of
elements to add to S, and R ⊆ S is some set of elements to remove from S. Then, if |R| ≤ r,
|A| ≤ a and (S \R)∪A ∈ I, we say that (A,R) is a valid (a, r)-exchange for S.2 At each step,
(a, r)-OblLocalSearch searches for a valid (a, r)-exchange that improves the objective function
f , and terminates when no such exchange can be found.
For the rest of this section we focus on the special case in which a = 1, and r = k. Even
in this case, there are no known performance guarantees for Algorithm 9 for k-systems or even
2In the typical terminology related to matroids, this operation is actually a replacement, since it replaceselements in one set with elements from another. We use the terminology “exchange” in keeping with theterminology “exchange systems,” which were named after systems studied by Brualdi and Scrimger [19, 17, 18]which gave rise to strongly base orderable matroids, discussed in Section 2.3. We shall see that k-exchangesystems have a close relationship with this class of matroids.
Chapter 5. Set Systems for Local Search 81
Algorithm 9: (a, r)-OblLocalSearch
Input: Independence system (X, I), given as an independence oracleFunction f : 2X → R≥0, given as a value oracle
S ← ∅;repeat
foreach A ⊆ X \ S with |A| ≤ a doforeach R ⊆ S with |R| ≤ r do
if (S \R) ∪A ∈ I and f((S \R) ∪A) > f(S) thenS ← (S \R) ∪A;
until no exchange is applied to S;return S;
k-extendible systems.3
In order to study the behavior of (1, k)-OblLocalSearch, we introduce the following class of
set systems, which we call weak k-exchange systems.
Definition 5.3 (Weak k-Exchange System). An independence system (X, I) is a weak k-
exchange system if, for all C and D in I, there exists a mapping Y assigning each element
x ∈ (C \D) a subset Y (x) of D \ C such that:
(WK1) |Y (x)| ≤ k for each x ∈ C \D.
(WK2) |Y −1(y)| ≤ k for each y ∈ D \ C.
(WK3) (D \ Y (x)) + x ∈ I for all x ∈ C \D.
(where Y −1(y) = x ∈ C \D : y ∈ Y (x)).
Intuitively, the definition mandates the existence of a collection of valid (1, k)-exchanges
between C and D, one for each element of C \D. This is already implicit in the definition of
k-extendible systems, but here we further require that no element of D \C appear in too many
of the resulting exchanges. Indeed, we can show that every k-exchange system is k-extendible.
Theorem 5.4. Let (X, I) be a weak k-exchange system. Then, (X, I) is k-extendible.
Proof. Suppose that C ⊆ D ∈ I, and C +x ∈ I for some x 6∈ C. Then, if x ∈ D, Definition 5.2
is trivially satisfied with Y = ∅. Thus, suppose that x 6∈ D, so x ∈ (C + x) \D. Then, let Y
be the set Y (x) from Definition 5.3. From properties (WK1) and (WK3) we have Y ⊆ C \D,
with |Y | ≤ k and (D \ Y ) + x ∈ I.
As in the case of 1-extendible systems, the class of weak 1-exchange systems behave specially.
3Fisher, Nemhauser and Wolsey [43] consider the more restricted algorithm (1, 1)-OblLocalSearch, and showthat it attains an approximation ratio of 1
2for submodular maximization in a single matroids, but no constant
factor for the intersection of 2 or more matroids.
Chapter 5. Set Systems for Local Search 82
Theorem 5.5. Let (X, I) be an independence system. Then, (X, I) is a matroid if and only
if it is a weak 1-exchange system.
Proof. Suppose that (X, I) is a weak 1-exchange system. We shall show that the condition of
Definition 2.10 must hold. Let C,D ∈ I, with |C| > |D|. If D ⊂ C, then for any element
x ∈ C \D we have D + x ∈ I and Definition 2.10 is satisfied. Thus, suppose that D 6⊂ C and
consider the mapping Y from C \ D to subsets of D \ C from Definition 5.3. From (WK2),
we have |Y −1(y)| ≤ 1 for all t ∈ D \ C, and so the number of elements x ∈ C \ D for which
Y (x) 6= ∅ is at most
|D \ C| = |D| − |D ∩ C| = |D| − (|C| − |C \D|) = |D| − |C|+ |C \D| < |C \D| ,
where the last inequality follows from |C| > |D|. Hence, there must be at least one element
x ∈ C \D with Y (x) = ∅, and from (WK3) we have (D \ Y (x)) + x = D + x ∈ I.
Conversely, suppose that (X, I) is a matroid and let C,D ∈ I. We extend both C and D
to bases C and D by adding arbitrary elements from X to each of them. Then, from Theorem
2.13 there must be a bijection π from C to D such that D− π(x) + x ∈ I for all x ∈ C. Define
Y (x) = π(x) ∩ D. Then, since π(x) = x for any x ∈ C ∩ D, we have π(x) ∈ D \ C for all
x ∈ C \ D and so Y (x) ⊆ D \ C ⊆ D \ C for all x ∈ C \ D, as required. Clearly |Y (x)| ≤ 1
for all x ∈ C. Moreover, because π is a bijection, every element of D \ C appears in Y (x) for
at most one x ∈ C \ D. Thus, properties (WK1) and (WK2) of Definition 5.3 are satisfied.
Finally, consider the element π(x) for x ∈ C \D. If π(x) ∈ D \D then we have Y (x) = ∅ and
(D \ Y (x)) + x = D + x ⊆ D − π(x) + x ∈ I .
If π(x) ∈ D then we have
(D \ Y (x)) + x ⊆ (D \ Y (x)) + x = D − π(x) + x ∈ I .
In either case, Y satisfies property (WK3) of Definition 5.3.
We now state our main result, which relates the locality ratio of (1, k)-OblLocalSearch to
the notion of weak k-exchange systems.
Theorem 5.6. Let (X, I) be a weak k-exchange system and f : 2X → R≥0 be a function.
Then the locality ratio of (1, k)-OblLocalSearch on the instance (X, I), f is at least 1k+1 if f is
monotone submodular and at least 1k if f is linear.
Proof. Suppose that f is monotone submodular. Let S ∈ I be the locally optimal solution
produced by (1, k)-OblLocalSearch on the given instance and let O ∈ I be an optimal solution
for this instance. Consider the mapping Y assigning each element x ∈ O \ S a subset Y (x) of
S \O. From (WK1) and (WK3) we have |Y (x)| ≤ k and (S \ Y (x)) + x ∈ I for all x ∈ O \ S.
Chapter 5. Set Systems for Local Search 83
Thus, for all x ∈ O \ S, the set (x, Y (x)) is a valid (1, k)-exchange for S. Because S is locally
optimal for all such exchanges, we have:
f((S \ Y (x)) + x) ≤ f(S) ,
for all x ∈ O \ S. Subtracting f(S \ Y (x)) from each side, we obtain
f((S \ Y (x)) + x)− f(S \ Y (x)) ≤ f(S)− f(S \ Y (x)) .
Applying submodularity on the left, we have
f(S + x)− f(S) ≤ f((S \ Y (x)) + x)− f(S \ Y (x)) ≤ f(S)− f(S \ Y (x)) , (5.1)
for each x ∈ O \ S. Summing (5.1) over all such x gives∑x∈O\S
f(S + x)− f(S) ≤∑
x∈O\S
f(S)− f(S \ Y (x)). (5.2)
Each element of x ∈ O \S occurs in exactly one of the sets on the left of (5.2). Thus, Theorem
2.6 gives ∑x∈O\S
f(S + x)− f(S) ≥ f(S ∪ (O \ S))− f(S) = f(S ∪O)− f(S) (5.3)
By property (WK2) of Y , each element of S \ O appears in at most k of the sets Y (x) on the
right. Thus, Theorem 2.7 gives∑x∈O\S
f(S)− f(S \ Y (x)) ≤ k [f(S)− f(S ∩O)] (5.4)
Combining (5.2), (5.3), and (5.4), we obtain
f(S ∪O)− f(S) ≤ k [f(S)− f(S ∩O)] ,
which is equivalent to
f(S ∪O) + kf(S ∩O) ≤ (k + 1)f(S)
From the monotonicity of f , we have f(O) ≤ f(S ∪ O). Furthermore, we have f(S ∩ O) ≥ 0,
from non-negativity of f . Thus, for any monotone submodular function f , we have
f(O) ≤ f(S ∪O) + kf(S ∩O) ≤ (k + 1)f(S) .
If in addition to being monotone submodular, f is in fact linear, then we have
f(S) + f(O) = f(S ∪O) + f(S ∩O) ≤ f(S ∪O) + kf(S ∩O) ≤ (k + 1)f(S) ,
Chapter 5. Set Systems for Local Search 84
since k ≥ 1. Hence,
f(O) ≤ kf(S)
5.3 Strong k-Exchange Systems
Theorem 5.6 shows that the oblivious local search algorithm (1, k)-OblLocalSearch attains a k-
approximation for linear maximization on any weak k-exchange system. However, Theorem 5.4
shows that every such system must also be k-extendible, and so the standard greedy algorithm
Greedy also provides a k-approximation. This seems to suggest that local search algorithms
for k-extendible set systems perform no better than their greedy counterparts, which are often
much faster. However, this is the not the case.
In fact, the best known approximation algorithms for several k-extendible problems are
based on local search. Examples of k-extendible problems for which local search algorithms
attain an approximation ratios better than 1k in the linear case or 1
k+1 in the monotone sub-
modular case include: matroid matching [70, 87], maximum independent set in (k + 1)-claw
free graphs and [55, 50, 22, 11, 12], and intersection of k matroids [71]. All of the local search
algorithms used in these cases consider exchanges larger than the (1, k)-exchanges used by
(1, k)-OblLocalSearch. With this in mind, we introduce the following strengthening of weak
k-exchange systems.
Definition 5.7 (strong k-exchange system). Let (X, I) be a weak k-exchange system. Then,
(X, I) is a strong k-exchange system if for all C,D ∈ I the mapping Y from Definition 5.3
additionally satisfies:
(SK3) (D \ Y (C ′)) ∪ C ′ ∈ I for all C ′ ⊆ C \D.
(where Y (C ′) =⋃x∈C′ Y (x)).
The main difference between weak and strong k-exchange systems is that in strong k-
exchange systems we can perform any number of (1, k)-exchanges (x, Y (x)) simultaneously.
This allows us to easily develop algorithms that consider significantly larger neighborhoods
than (1, k)-OblLocalSearch.
As in the case of weak 1-exchange systems, we can show that the strong 1-exchange systems
are equivalent to a known class of set systems: the class of strongly base orderable matroids,
discussed in Section 2.3.
Theorem 5.8. Let (X, I) be an independence system. Then, (X, I) is a strongly base orderable
matroid if and only if it is a strong 1-exchange system.
Proof. Let (X, I) be a strong 1-exchange system. Then, by Theorem 5.5, (X, I) is a matroid.
Let C,D be two bases of I, and let Y be the mapping from C \ D to subsets of D \ C from
Chapter 5. Set Systems for Local Search 85
Definition 5.7. Each element of D \ C appears in at most 1 of the sets Y (x) for x ∈ C \ D.
Furthermore, since (X, I) is a matroid we must have |C| = |D| and so
|C \D| = |C| − |C ∩D| = |D| − |C ∩D| = |D \ C| .
Thus, we must in fact have each element of D \ C in exactly one set Y (x) for x ∈ C \ D.
We construct a bijection π : C → D by setting π(x) to be the single element in Y (x) for all
x ∈ C \D and letting π(x) = x for all x ∈ C ∩D. Then, for any C ′ ⊆ C we have
(D \ π(C ′)) ∪ C ′ =(D \ Y (C ′\D)
)∪ (C ′\D) ∈ I ,
from (SK3), since C ′ \D ⊆ C \D. Thus, (X, I) is strongly base orderable.
Conversely suppose that (X, I) be a strongly base orderable matroid. Let C and D be sets
in I and extend them to bases C and D by adding some arbitrary elements to them. Because
(X, I) is strongly base orderable there exists a bijection π : C → D such that (D\π(C ′))∪C ′ ∈ Ifor all C ′ ⊆ C. As in Theorem 5.5, set Y (x) = π(x) ∩ (D \ C) for all x ∈ C \D. Then, as
shown in Theorem 5.5, Y satisfies (WK1) and (WK2). Additionally, for any C ′ ⊆ C \ D we
have
D \ Y (C ′) = (D \ Y (C ′)) ∩ (C ∪D)
=[D \ (π(C ′) ∩D)
]∩ (C ∪D)
=[(D \ π(C ′)) ∪ (D \D)
]∩ (C ∪D)
= (D \ π(C ′)) ∩ (C ∪D) ⊆ D \ π(C ′)
So, we have (D \ Y (C ′)) ∪ C ′ ⊆ (D \ π(C ′)) ∪ C ′ ∈ I. Thus (SK3) is satisfied and so (X, I) is
a strong k-exchange system.
Finally, we note that the classes of strong and weak k-exchange systems are closed under
contraction.
Theorem 5.9. Let (X, I) be a weak k-exchange system and define Ix = Y ⊆ X −x : Y +x ∈I. Then, (X − x, Ix) is a weak k-exchange system. Furthermore, if (X, I) is a strong k-
exchange system, then so is (X − x, Ix).
Proof. Consider two sets C ′, D′ ∈ Ix. Then, we must have C ′+x and D′+x in I. Let Y be the
mapping for sets C = C ′ + x and D = D′ + x in Definition 5.3. Then, Y assigns each element
of C \D a subset of D \C and satisfies (WK1), (WK2), and (WK3). But, C \D = C ′ \D′ and
D \ C = D′ \ C ′ so Y in fact assigns each element of C ′ \D′ a subset of D′ \D′. Set Yx = Y .
Then, since Y satisfies (WK1), (WK2), and (WK3) with respect to C and D, Yx satisfies
(WK1), (WK2), and (WK3) with respect C ′ and D′. Thus, (X − x, Ix is a weak k-exchange
system. Moreover, if (X, I) is a strong k-exchange system then the mapping Y additionally
Chapter 5. Set Systems for Local Search 86
satisfies (SK3) for C,D and hence Yx satisfies (SK3) for C ′, D′. Thus, in this case (X − x, Ix)
is also a strong k-exchange system.
We defer our discussion of algorithms for strong k-exchange systems to the next chapter,
where we give algorithms for both linear and submodular maximization.
5.4 Applications
Now we give some examples of combinatorial optimization problems that are naturally express-
ible as k-exchange systems. For each problem we give a cursory overview of related results,
including a discussion of the best known approximation algorithms, as well as a theorem plac-
ing the problem in the hierarchy or k-exchange systems. We shall not give explicit proofs that
the set systems discussed are independence systems, when it is reasonably obvious or follows
directly from the nature of underlying systems. Our theory of k-exchange system systems was
inspired largely by connections between the independent set problem in (k+1)-claw free graphs
and the matroid k-parity problem. Thus, we spend a bit more time on the discussion of these
particular applications.
5.4.1 Independent Set in (k + 1)-Claw Free Graphs
A graph G is (k + 1)-claw free if there are at most k independent vertices in the neighborhood
of any vertex v of G. The class of such graphs, includes many classes of intersection graphs
(e.g. intersection graphs of unit intervals and unit discs). Additionally, many problems can be
reduced to the problem of finding an independent set in a (k+1)-claw free graph, including the
unit job interval scheduling problem (k = 3), k-set packing, and k-dimensional matching. Here
we study the problem of finding an independent set of vertices in G that maximizes a given
function f : 2V → R≥0.
Hazan, Safra, and Schwartz [53] consider the special case of unweighted k-set packing (here,
f(S) is simply |S|). They show that for k ≥ 3, unweighted k-set packing cannot be ap-
proximated to within a factor of Ω(ln k/k) unless P = NP. Hurkens and Schrijver [55] give
a 2k+ε -approximation for unweighted k-set packing as well as the general problem of finding
a maximum cardinality independent set in a (k + 1)-claw free graphs. Their algorithm is a
straightforward oblivious local search algorithm. Halldorsson [50] shows that a simplified local
search algorithm with a smaller neighborhood structure attains the same result for unweighted
independent set in (k+ 1)-claw free graphs, and analyzes its performance for a variety of other
problems.
In the weighted case (i.e. the case in which f is linear), the greedy algorithm yields a k-
approximation, since the independent sets in a (k+ 1)-claw free graph form a k-system. Arkin
and Hassin [5] considered the weighted k-set packing problem, and show that (t, t−1)-OblLocal-
Search has a locality ratio of only 1k−1+ 1
t
. Chandra and Halldorsson showed that it is possible
Chapter 5. Set Systems for Local Search 87
to modify the oblivious local search algorithm to approximate the problem beyond the locality
ratio, attaining a(
32(k+1) − ε
)-approximation even for the general problem of maximum weight
independent set in (k + 1)-claw free graphs. Their oblivious local search algorithm starts from
a greedy solution and chooses the best available (k, k2)-exchange at each step. By using a
non-oblivious local search algorithm that seeks to maximize the squared weight of the current
independent set, Berman [11] attained(
2k+1 − ε
)-approximation. Later, Berman and Krysta
[12] gave a modified version of this non-oblivious local search algorithm with an approximation
ratio of(
32k − ε
)whose runtime is independent of k.
The algorithms that we present in the next chapter extend Berman’s non-oblivious approach
to all strong k-exchange systems, and further generalize it to the case of monotone submodular
objective functions. We now show that the independent sets in a (k + 1)-claw free graph form
a strong k-exchange system.
Theorem 5.10. Let G = (V,E) be a (k + 1)-claw free graph, and I be the set of independent
sets of vertices in G. Then (V, I) is a strong k-exchange system.
Proof. Let C and D be 2 independent sets of vertices in G. For each vertex x ∈ C \D, let Y (x)
be the set of all vertices in D adjacent to x. Then, since C and D are independent we must
in fact have Y (x) ⊆ D \ C as required. Because G is (k + 1)-claw free and all the vertices in
D \C are independent, we must have |Y (x)| ≤ k. Thus Y satisfies (WK1). A vertex y ∈ D \Cappears in Y (x) for some x ∈ C \ D if and only if it is adjacent to x. Then, because G is
(k + 1)-claw free and all the vertices of C \ D are independent, we must have |Y −1(y)| ≤ k.
Thus Y satisfies (WK2). Finally, for any C ′ ⊆ C \ D, the set D \ Y (C ′) contains no vertices
adjacent to C ′. Thus, (D \ Y (C ′)) ∪ C ′ is an independent set, and so Y satisfies (SK3).
The class of strong k-exchange systems has a close relationship to (k+ 1)-claw free graphs.
In fact, the strong k-exchange systems can be defined succinctly in graph-theoretic terms. The
class of (k + 1)-claw free graphs has the property that, for any 2 independent sets C,D, the
degree of the bipartite subgraph induced by C ∪ D is at most k. Similarly, an independence
system (X, I) is a strong k-exchange system if and only if for each C,D ∈ I we can construct a
bipartite graph GC,D on the vertex set C ∪D such that every vertex has degree at most k and
every independent set in GC,D is in I. The mapping Y (x) is simply the standard neighborhood
relation in GC,D, and conditions (WK1) and (WK2). In this sense, the strong k-exchange
systems behave locally like (k+1)-claw free graphs, in the sense that for each individual pair of
independent sets C,D, we can construct a (k + 1)-claw free graph that encodes independence
on C ∪D.
5.4.2 k-Matroid Intersection
In the k-matroid intersection problem, we are given k matroids M1, . . . ,Mk, each defined
on a common ground set X, together with a function f : 2X → R≥0. The goal is to find
a set S ⊆ X that maximizes f subject to the constraint that S is an independent set in
Chapter 5. Set Systems for Local Search 88
each of the k matroids. As discussed in Section 2.3, the intersection of k matroids forms a
k-system. Thus, the greedy algorithm gives a 1k -approximation for linear functions f and a
1k+1 -approximation monotone submodular functions. Lee, et al. [69] give an improved 1
k+ε for
monotone submodular maximization in the specific case of partition matroids via a local search
algorithm. Lee, Sviridenko, and Vondrak [71] generalize this result, giving a general oblivious
local search algorithm that is a 1k−1+ε -approximation for maximizing linear functions subject
to arbitrary matroid constraints and a 1k+ε -approximation for monotone submodular functions.
For k = 2, the problem of maximizing a linear function f can be solved in polynomial
time [68, 34]. Because the k-dimensional matching problem can be represented as k-matroid
intersection, the Ω( ln kk )-inapproximability result of Hazan, Safra, and Schwartz [53] applies
here, as well, for all k ≥ 3. In the case of a monotone submodular objective function, the
problem is NP-hard even in the case of a single matroid, as we discussed in Chapter 3.
We now show that the intersection of k matroids is a weak k-exchange system for general
matroids, and a strong k-exchange system when all of the k matroids are strongly base orderable.
This, together with the algorithmic results in the next chapter, allow us to give improved
approximations for k-matroid intersection with both linear and monotone submodular objective
functions, in the special case that all the matroids are strongly base orderable.
Theorem 5.11. Let M1 = (X, I1), . . . ,Mk = (X, Ik) be k matroids on the ground set X and
let I =⋃ki=1 Ik. Then, (X, I) is a weak k-exchange system. If each ofM1, . . . ,Mk are strongly
base orderable, then (X, I) is strong k-exchange system.
Proof. Let C,D ∈ I be two sets that are independent in each of M1, . . . ,Mk. From Theorem
5.5, each matroid Mi is a weak 1-exchange system. Let Yi be the mapping assigning each
element of C → D a subset of D \ C for the matroid Mi, and let Y (x) =⋃ki=1 Yi(x). Then,
|Y (x)| ≤k∑i=1
|Yi(x)| ≤ k
for all x ∈ C, where the last inequality follows from the fact that each Yi satisfies (WK1) with
k = 1. Similarly, we have Y −1(y) =⋃ki=1 Y
−1i (y), and so
|Y −1(y)| ≤k∑i=1
|Y −1i (y)| ≤ k
for all y ∈ D \ C, where the last inequality follows from the fact that each Yi satisfies (WK2).
Thus, Y satisfies (WK1) and (WK2) with k = 1. Finally, we note that for any x ∈ C,
(D \ Y (x)) + x ⊆ (D \ Yi(x)) + x ∈ Ii
for each i ∈ [k], where the final statement holds because Yi satisfies (WK3). Thus, for any
x ∈ C \D, we have D \Y (x)+x ∈ Ii for each matroidMi and so Y satisfies (WK3) is satisfied.
Chapter 5. Set Systems for Local Search 89
If each of M1, . . . ,Mk are strongly base orderable, then from Theorem 5.8 each Mi is a
strong k-exchange system. Then we note that for any C ′ ⊆ D we have
(D \ Y (C ′)) ∪ C ′ ⊆ (D \ Yi(C ′)) ∪ C ′ ∈ Ii
for all i ∈ [k], where the final statement holds because Yi satisfies (SK3). Thus, for any
C ′ ⊆ C \D, we have (D \ Y (C ′)) ∪ C ′ ∈ Ii for each matroid Mi and so Y satisfies (SK3).
5.4.3 k-Uniform Hypergraph b-Matching
In the k-uniform hypergraph b-matching problem, we are given a k-uniform4 hypergraph H =
(V, E) and a budget b ∈ N, together with a function f : 2E → R≥0. The goal is to choose a
set of hyperedges S ⊆ E that maximizes f subject to the constraint that each vertex v ∈ Vis contained in at most b of the hyperedges in S. We consider here a generalization in which
each vertex v ∈ V has its own budget b(v) ∈ N. In the case that b(v) = 1 for all vertices, we
obtain the hypergraph matching problem, which is equivalent to k-set packing problem. Thus,
while the standard b-matching problem, corresponding to k = 2 is solvable in polynomial time,
for k ≥ 3 the problem is NP-hard and, moreover, the hardness of approximation results [53] for
k-set packing apply. El Ouali, Fretwurst, and Srivastav generalize this result to show that in
the case that b(v) = b for all vertices v, it is NP-hard to approximate k-uniform hypergraph
b-matching to within a factor of Ω(b log kk
).
Theorem 5.12. Let I be the set of all b-matchings in a k-uniform hypergraph H = (V, E).
Then, (E ,I) is a strong k-exchange system.
Proof. This proof is heavily indebted to Moran Feldman, Seffi Naor, and Roy Schwartz’s proof
that b-matching is a 2-exchange system. Consider 2 b matchings C and D in I. For each
vertex v ∈ V , let δC(v) and δD(v) be the sets of hyperedges from C \ D and D \ C, respectively,
containing v. For each vertex v ∈ V , we number the hyperedges in δC(v) and δD(v) arbitrarily,
and denote by νC(v,E) and νD(v,E) the label in [b(v)] given to a hyperedge E at vertex v in
C and D, respectively (note that a hyperedge may receive a different label at each one of its
vertices). Then, for each hyperedge E ∈ C \ D, we set
Y (E) = E′ ∈ D \ C : ∃v ∈ (E′ ∩ E) with νC(v,E) = νD(v,E′) .
That is, Y (E) contains those hyperedges in D \ C that share a vertex with E and have the
same label as E at this vertex. For each vertex v of a hyperedge E ∈ C \ D there is at most 1
hyperedge E′ ∈ D \ C that has νD(v,E′) = νC(v,E). Thus, |Y (E)| ≤ k for all E ∈ C \ D and
so (WK1) is satisfied. Similarly, for each vertex v of a hyperedge E′ ∈ D \ C there is at most 1
4The constraint that each hyperedge have exactly k vertices can easily be relaxed to a constraint that eachhyperedge have at most k vertices by adding “dummy” vertices to the graph.
Chapter 5. Set Systems for Local Search 90
hyperedge E ∈ C \ D that has νC(v,E) = νD(v,E′). Thus, Y −1(E′) ≤ k for all E′ ∈ D \ C and
so (WK2) is satisfied.
Finally, let C′ ⊆ C \ D and consider the set of hyperedges D′ = (D \ Y (C′)) ∪ C′. For each
vertex v ∈ V and each E ∈ D′, let νD′(v,E) = νD(v,E) if E ∈ D and νD′(v,E) = νC(v,E) if
E ∈ D \ C. The construction of Y ensures that for any vertex v ∈ V , all hyperedges E of D′
that contain v must have distinct labels νD′(v,E). For each vertex v ∈ V there are only b(v)
distinct labels νD′(v,E) and so the number of hyperedges in D′ incident to each vertex v is at
most b(v). Thus, D′ must be a valid b-matching and so (SK3) is satisfied.
5.4.4 Matroid k-Parity
In the matroid k-parity problem, we are given a matroid M = (X, I) and a set E of pairwise
disjoint k-element subsets, together with a function f : 2E → R≥0. The goal is to find a set
S ⊆ E that maximizes f , subject to the constraint that the elements covered by the sets in Sis in I. A related problem is the matroid k-matching problem. In this problem, we are given
M together with a hypergraph H = (X, E) and a function f : 2E → R≥0 and the goal is to
find a matching S in H that maximizes f , again subject to the constraint that the of elements
covered by the edges in S is in I. The matroid k-parity problem corresponds to the special
case in which all the edges in H are disjoint. Matroid matching in a k-regular hypergraph is
reducible to matroid k-parity as well, and thus the two problems are equivalent [70].
The matroid k-parity problem can be viewed as a common generalization of k-matroid
intersection and the k-uniform hypergraph b-matching problem, which we have just considered.
Unlike these problems, however, matroid k-parity is NP-hard even in the case k = 2. If the
matroid M is given by an independence oracle, there are instances of the matroid 2-matching
problem (and hence also matroid 2-parity) for which obtaining an optimal solution requires
an exponential number of oracle calls, even in the unweighted case in which f is a cardinality
function [59]. These instances can be modified to show that matroid 2-parity and matroid
2-matching are NP-hard, via a reduction from the maximum clique problem [85].
In the special case of linear matroids, Lovasz [72, 73] obtains a polynomial time algorithm
for maximum cardinality matroid 2-matching. Lee, Sviridenko, and Vondrak [70] give a PTAS
for maximum cardinality matroid 2-parity in arbitrary matroids, and a k/2 + ε approximation
for matroid k-parity in arbitrary matroids.
In the weighted case (in which f is a linear function), the greedy algorithm provides a
k-approximation, since both the matching and parity problems are examples of k-systems.
Although this remains the best known result for general matroids, some improvement has been
made in the case of k = 2 for restricted classes of matroids. Tong, Lawler, and Vazirani give
an exact polynomial time algorithm for weighted matroid 2-parity in gammoids [90]. Soto [87]
extends this result, obtaining a PTAS for all strongly base orderable matroids. Soto further
shows that weighted matroid 2-matching remains NP-hard even in this restricted case.
Chapter 5. Set Systems for Local Search 91
For any set P of subsets of X, let
P =⋃P∈P
P
denote the set of all elements contained in some set of P. Then, we have the following theorem,
which relates the matroid k-parity problem to k-exchange systems.
Theorem 5.13. Let (X, IM) be a strongly base orderable matroid and let E be a set of disjoint
k-element subsets of X. Further let I be the set of all subsets S ⊆ E that satisfy S ∈ IM.
Then, (X, IM) is a strong k-exchange system.
Proof. Consider C,D ∈ I. From Theorem 5.8 (X, IM) is a strong 1-exchange system and so
there is a mapping YM assigning each element in C \D a set of elements in D \C and satisfying
(WK1), (WK2), and (SK3). For each k-set E ∈ C \ D we define
Y (E) = E′ ∈ D \ C : E′ ∩ YM(E) 6= ∅ .
Because no two sets in E share an element, there are at most |YM(E)| distinct sets E′ ∈ D\Csuch that E′ ∩ YM(E) 6= ∅. We have
|Y (E) ≤ |YM(E)| ≤∑e∈E|YM(e)| ≤ k ,
where the final inequality follows from the fact that YM satisfies (WK1) with k = 1. Thus, Y
satisfies (WK1).
Similarly, there at most |Y −1M (E′)| distinct sets E ∈ C \D such that E′ ∩ YM(E) 6= set. We
have:
Y −1(E′) ≤ |Y −1M (E′)| ≤
∑e′∈E′
|Y −1M (e′)| ≤ k ,
where the final inequality follows from the fact that YM satisfies (WK2) with k = 1. Thus, Y
satisfies (WK2).
Finally, for any C′ ⊆ C \ D we have
(D \ Y (C′)) ∪ C′ = D \ Y (C′) ∪ C′ ⊆ (D \ YM(C′)) ∪ C′ .
From property (SK3) of YM, the last set must be in IM. Thus, D \ Y (C′) ∪ C ∈ I and so Y
satisfies (SK3).
We briefly note that even in the case that the given matroid is not strongly base orderable,
matroid k-parity gives rise to a weak k-exchange system. The proof requires a stronger version
Theorem 2.13 in which the bijection π is between the sets of a partition of each of the 2 bases.
This stronger exchange property is given by Greene and Magnanti [48, Theorem 3.3].
Chapter 5. Set Systems for Local Search 92
5.4.5 Maximum Asymmetric Traveling Salesman
In the maximum asymmetric traveling salesman problem (MaxATSP) we are given a directed
graph G = (V,E), together with a function f : 2E → R≥0. We seek a directed Hamiltonian
cycle in G that maximizes f .
In the standard variant of the problem, f is a linear function given as a weight for each edge.
There has been a great deal of recent work on both the general case and metric case, in which the
weights on E must satisfy the triangle inequality. We do not attempt to give a complete history
of the problem here, but rather state only the current best results, for the sake of comparison.
In the general setting, the best known approximation algorithm is a 23 -approximation by Kaplan
et al. [62]. In the metric case, Kowalik and Mucha [67] give a 78 -approximation algorithm. The
problem is APX-hard even in the metric case (this follows from a result by Papadmitriou and
Yannakakis [80] for the minimization version of the traveling salesman problem with all weights
1 or 2), and Karpinsky and Schmied show that in the general case it is NP-hard to attain any
approximation beyond 206207 .
We now show that MaxATSP gives rise to a 3-exchange system. Although the algorithms we
present in the next chapter do not attain the best known approximation results for MaxATSP,
the proof of Theorem 5.14 is of some interest in its own right. The independence system
associated with MaxATSP contains all Hamiltonian cycles in G as well as their subsets (the
inclusion of the latter ensures that the system is indeed an independence system). A set of
edges S is independent in this system if and only if every vertex in the subgraph (V, S) has
indegree and outdegree at most 1 and S contains no cycle that is not a Hamiltonian cycle. We
can represent these constraints as the intersection of 3 matroids: one partition matroid enforcing
the indegree constraint, one partition matroid enforcing the outdegree constraint, and a graphic
matroid enforcing the constraint that no non-Hamiltonian cycles are present. Surprisingly, even
though this last matroid is not necessarily strongly base orderable, the intersection of all three
matroids forms a strong k-exchange system, as we now show.
Theorem 5.14. Let G = (V,E) be a directed graph and let I be the set of all Hamiltonian
cycles in G, together with all of their subsets. Then, (E, I) is a strong 3-exchange system.
Proof. We present the following proof based on a proof by Feldman, Naor, and Schwartz.
Suppose that C,D ∈ I. For every edge e = (h, t) ∈ C \D the set Y (e) ⊆ D \ C comprises the
following three types of edges in D \C: (1) edges of the form (h, x) for some vertex x; (2) edges
of the form (x, t) for some vertex x (3) the first edge of D \C encountered on any path from t
containing only edges of D.
For each e ∈ C \D there can be at most 1 edge of each type. For edges of type (1) and (2),
this follows directly from the fact that the indegree and outdegree of D is at most 1. This fact
also implies that there can be at most 1 path from t containing only edges of D. Thus, there
can be at most 1 edge of type (3). Thus, (WK1) holds.
Next, we show that edge e′ ∈ D \ C can appear as an edge of each type only once, and so
Chapter 5. Set Systems for Local Search 93
(WK2) holds. For type (1) and (2) this follows directly from the fact that the indegree and
outdegree of C is at most 1. For type (3), we again note that there is at most 1 path P in D
containing the edge e′. Consider any 2 edges e1 = (h1, t1) and e2 = (h2, t2) in C \D. We show
that e′ can appear as an edge of type (3) for at most 1 of these edges. Since the outdegree of
all vertices in C is at most 1, we must have t1 6= t2. Suppose without loss of generality that t1
comes before t2 on the path P , and consider the edge (x, t2) in P . Because e2 6∈ D, we must
have x 6= h2. Because the indegree of all edges in C is at most 1, we cannot have (x, h2) ∈ C,
and so (x, h2) ∈ D \C. But, we encounter (x, h2) before e′ when traversing the path P starting
at t1. Thus, we can have e′ as an edge of type (3) only for t2.
Finally, we show that (SK3) holds. Let C ′ ⊆ C \D and D′ = (D \ Y (C ′)) ∪ C ′. We must
show that D′ ∈ I. For every e = (h, t) ∈ C ′, Y (C ′) contains all edges from D the form (h, x) or
(x, t). Thus, all such vertices h and t have outdegree and indegree 0, respectively, in D \Y (C ′),
and so have outdegree and indegree 1 in D′. All other vertices have the same indegree and
outdegree in D′ as in D. It remains to show that D′ does not contain any non-Hamiltonian
cycles. Suppose, for the sake of contradiction, that D′ does contain a non-Hamiltonian cycle.
Because both C and D do not contain any non-Hamiltonian cycles, this cycle must contain at
least one edge e from C \D and one edge e′ from D \ C. Suppose that e and e′ are two such
edges and that the only edges between e and e′ on the cycle are from D. Such a pair of edges
must exist (note that the set of edges between e to e′ may be empty in the case that e and e′
are adjacent). Then, we must have e′ ∈ Y (e) of type (3), and so e′ cannot be in D′. Thus, D′
has indegree and outdegree 1 and cannot contain any non-Hamiltonian cycles, so we must have
D′ ∈ I.
Chapter 6
Algorithms for Strong k-Exchange
Systems
In the previous chapter, we defined the classes of weak and strong k-exchange systems, and
showed how several combinatorial optimization problems fit into these classes. Our main mo-
tivation for introducing the class of strong k-exchange systems was the observation that, while
the locality ratio of the simple algorithm (1, k)-OblLocalSearch is no better than the approxima-
tion ratio of the greedy algorithm for all weak k-exchange system, more complex local search
algorithms have given improved results for many of the problems discussed in Section 5.4. The
result of Arkin and Hassin [5] in the specific case of k-set packing shows that even if we consider
(t, t− 1)-exchanges, the oblivious local search algorithm has a locality ratio of only 1k−1+ 1
t
. In
contrast, Berman [11] shows that a simple, non-oblivious algorithm attains a locality ratio of2
k+1 by considering only (k, k2 − k + 1)-exchanges. We now generalize Berman’s result, giving
a non-oblivious local search algorithm for both linear and monotone submodular maximization
in any strong k-exchange system. The algorithms run in deterministic polynomial time and
are(
2k+1 − ε
)- and
(2
k+3 − ε)
-approximations for linear and monotone submodular maximiza-
tion, respectively. By using the partial enumeration technique described in Section 2.6, we can
remove the extra factor of ε from these ratios, obtaining a clean 2k+1 - and 2
k+3 -approximation.
The resulting algorithm gives improved approximation results for several problems. We
summarize these results in Table 6. The 1k -approximation for strongly base orderable matroid
k-parity is not stated explicitly by Jenkyns [58] (who gives results for the related k-matchoid
problem), but follows directly from the fact that matroid k-parity is a k-system. Similarly,
the monotone submodular maximization results for (k + 1)-claw free graphs and strongly base
orderable matroid k-parity follow directly from Fisher, Nemhauser, and Wolsey’s [43] work on
submodular maximization in k-systems, even though they do not explicitly show that these
problems may be formulated as k-systems. Finally, the previous results for maximum asym-
metric traveling salesman follow from the fact that the set of (partial) directed Hamiltonian
cycles can be represented as the intersection of 3 matroids, as noted in Section 5.4.5. In this
94
Chapter 6. Algorithms for Strong k-Exchange Systems 95
case, while the algorithm of Lee, Sviridenko, and Vondrak [71] is a 13+ε -approximation for any
ε, its runtime is exponential in ε−1. Thus, the improvement to 13 is more significant than it first
appears.
In independent work (presented jointly with this work in [39]), Feldman, Naor, and Schwartz
give improved approximations for non-monotone submodular maximization in strong k-exchange
systems. Their approach is based on an oblivious local search algorithm similar to that employed
by Lee, Sviridenko, and Vondrak [71] in the case of k-matroid intersection. The algorithm is
a k−1k2+ε
-approximation, and gives improved approximations for many other specific problems
as well. Like the algorithm of Lee, Sviridenko, and Vondrak, their algorithm has exponential
dependence on ε−1.
Table 6.1: Approximation Ratios for k-Exchange Systems
Problem Objective* Previous Result Our Result
Indep. Set in (k + 1)-Claw Free Graphs MS 1k+1 [43] 2
k+3
SBO Matroid k-Intersection L 1k−1+ε [71] 2
k+1
MS 1k+ε [71] 2
k+3
SBO Matroid k-Parity L 1k [58] 2
k+1
MS 1k+1 [43] 2
k+3
Maximum ATSP MS 13+ε [71] 1
3
* L : linear, MS : monotone submodular
6.1 Linear Maximization
In this section, we consider the problem of maximizing a linear function f in a strong k-exchange
system. We show that the non-oblivious local search algorithm introduced by Berman [11] for
(k+ 1)-claw free graphs in fact applies to any strong k-exchange system. This is not surprising
given our observation (in Section 5.4.1) that strong k-exchange systems behave like (k + 1)-
claw free graphs for every pair independent sets. Nonetheless, we present here a full analysis
of the linear case, as we shall need some of the ideas and intuition from it for the monotone
submodular case, which we consider in the next section. Throughout our analysis, we assume
that the linear function f has been given in the form of a weight function w, assigning each
element x ∈ X a non-negative weight w(x). We measure the complexity of algorithms in terms
of the parameter k, and n = |X|.The non-oblivious local search algorithm Linear-k-Exchange is shown in Algorithm 10. It
considers a larger neighborhood than (1, k)-OblLocalSearch, at each step searching through all
possible valid (k, r(k))-exchanges, where r(k) = k2 − k + 1. Our analysis will make crucial
use of the fact that, since we are in a strong k-exchange system, that we can build a valid
(k, r(k))-exchange by combining several overlapping valid (1, k)-exchanges. Additionally, Linear-
Chapter 6. Algorithms for Strong k-Exchange Systems 96
k-Exchange uses the non-oblivious potential function
w2(S) =∑x∈S
w(x)2 ,
obtained by squaring all of the weights given to the algorithm. Note that w2 is a linear function,
so the standard local improvement condition
w2((S \B) ∪A) > w2(S)
used to the guide the search is equivalent to the condition
w2(A) > w2(B) .
We use this latter condition in both algorithms presented in this section.
We initialize Linear-k-Exchange by using a greedy solution. The algorithm alters the given
weight function by rounding the weights down to integer multiples of a sufficiently small value
ε2. This is equivalent to requiring that each improvement made by the algorithm increases
the potential function w2 by an additive amount of at least ε 22 . Although this differs from our
standard, multiplicative notion of approximate local optimality, it will simplify our analysis.
Finally, we note that the fact that (X, I) is a strong k-exchange system will be used only
in the analysis of Linear-k-Exchange. This is also the case for the algorithm that we present in
the next section for the monotone submodular case. In particular, neither algorithm needs to
construct the mapping Y from Definition 5.7.
Algorithm 10: Linear-k-Exchange
Input: Approximation parameter εStrong k-exchange system (X, I), given as an independence ora-cleWeight function w : X → R≥0
Let Sinit be the result of running Greedy on (X, I), w;
Let ε2 = k+12 w(Sinit)ε/n;
S ← Sinit;
foreach x ∈ X do w(x)←⌊w(x)ε2
⌋ε2;
repeatforeach A ⊆ X \ S with |A| ≤ k do
foreach R ⊆ S with |R| ≤ k2 − k + 1 doif (S \R) ∪A ∈ I and w2((S \R) ∪A) > w2(S) then
S ← (S \R) ∪A;break;
until no exchange is applied to S;return S;
Chapter 6. Algorithms for Strong k-Exchange Systems 97
We now consider the approximation ratio of Linear-k-Exchange. We consider an arbitrary
instance (X, I), w and suppose that S is a locally optimal solution returned by the algorithm
on this instance, while O is a global optimum for this instance.
A general outline of our proof is as follows: we consider only a particular set of (k, r(k))-
exchanges (A,R) where A ⊆ O and R ⊆ S. These exchanges satisfy some additional properties,
which we use in Lemma 6.1 to derive a relationship between the non-oblivious potential function
w2 and the weight function w. In Lemma 6.2 we then use this relationship to derive a lower
bound on the weight w(x) of each element x in a locally optimal solution. In Theorem 6.3
we combine these lower bounds to obtain a bound on the locality ratio of Linear-k-Exchange
(in terms of the rounded weight function used by the algorithm). Finally, in Theorem 6.4 we
bound the approximation ratio and runtime of Linear-k-Exchange.
We now describe the set of (k, r(k))-exchanges used in our analysis. We have S,O ∈ I for
the strong k-exchange system (X, I). Thus, there must be a collection Y assigning each z of
O \ S a set Y (z) ⊆ S \ O, satisfying the conditions of Definition 5.7. For each x ∈ S \ O,
let Px be the set of all elements z ∈ O \ S for which: (1) x ∈ Y (z) and (2) for all y ∈ Y (z),
w(y) ≤ w(x). That is, Px is the set of all elements z ∈ O \ S that share x as a largest weight
member of Y (z). Finally, for all x ∈ S∩O, we set Px = x, and extend Y so that Y (Px) = x(note that the resulting Y still obeys properties (WK1), (WK2), and (SK3), since for x ∈ S∩Owe have y ∈ Y (x) only when y = x and (S \ Y (Px)) ∪ Px = S.
For each x ∈ S \O, consider the exchange (Px, Y (Px). From property (WK2) of Y we have
|Px| ≤ k. Similarly, from property (WK1) and the fact that all elements z ∈ Px share the
common element x ∈ Y (z) we have
|Y (Px)| ≤ k(k − 1) + 1 = k2 − k + 1 = r(k) .
Finally, from property (SK3) we have (S \ Y (Px)) ∪ Px ∈ I. Thus, (Px, Y (Px)) is a valid
(k, r(k))-exchange for all of our sets Px, where x ∈ S. For all x ∈ S ∩O, we have (S \Px)∪Px.
Furthermore, note that Pxx∈S is a partition of O. The construction of P depends crucially on
the strong k-exchange property, which allows us to consider Y (z) for each element z in isolation
then combine appropriate elements z into a single valid exchange (Px, Y (P (x)).
The following extension of a theorem from [11] relates the non-oblivious potential function
w2 to the weight functions w.
Lemma 6.1. Suppose that x ∈ S and z ∈ X \S and let w be any non-negative weight function
on S + z. If w(y) ≤ w(x) for all y ∈ Y (z), then
w(z)2 − w2(Y (z)− x) ≥ w(x) · (2w(z)− w(Y (z))) .
Proof. First, we note that
0 ≤ (w(x)− w(z))2 = w(x)2 − 2w(x) · w(z) + w(z)2 . (6.1)
Chapter 6. Algorithms for Strong k-Exchange Systems 98
Additionally, since every element y in Y (z) has weight at most w(x):
w2(Y (z)− x) =∑
y∈Y (z)−x
w(y)2 ≤ w(x)∑
y∈Y (z)−x
w(y) = w(x) · w(Y (z)− x) . (6.2)
Adding (6.1) and (6.2) we obtain
w2(Y (z)− x) ≤ w(x)2 − 2w(x) · w(z) + w(z)2 + w(x) · w(Y (z)− x) .
Since w(x) · w(Y (z)− x) + w(x)2 = w(x) · w(Y (z)) this is equivalent to
w2(Y (z)− x) ≤ w(x) · (w(Y (z))− 2w(z)) + w(z)2 ,
which is equivalent to the inequality stated in the Lemma.
The next Lemma uses Lemma 6.1 together with the properties of the partition Px to trans-
late the local optimality of S with respect w2 into a lower bound on w(x), by considering the
(k, r(k))-exchange (Px, Y (Px)).
Lemma 6.2. Suppose that x ∈ S and Px ⊆ X \ S such that x ∈ Y (z) for all z ∈ Px. Let w
be any non-negative weight function such that w2(Px) ≤ w2(Y (Px)) and w(y) ≤ w(x) for all
y ∈ Y (Px). Then,
w(x) ≥∑z∈Px
[2w(z)− w(Y (z))] .
Proof. First, we consider the case w(x) = 0. The weights w are non-negative and w(x) ≥ w(y)
for every y ∈ Y (Px). Thus, w(y) = 0 for all y ∈ Px and so w2(Y (Px)) = 0. Moreover, since
w2(Px) ≤ w2(Y (Px)) = 0, we must have w(z) = 0 for all z ∈ Px. The claim then follows.
Now, suppose that w(x) > 0. Since x ∈ Y (z) for all z ∈ Px, we have
w2(Px) ≤ w2(Y (Px)) ≤ w(x)2 +∑z∈Px
w2(Y (z)− x) . (6.3)
Rearranging (6.3) using w2(Px) =∑
z∈Px w(z)2 we obtain:∑z∈Px
[w(z)2 − w2(Y (z)− x)
]≤ w(x)2 . (6.4)
For each z ∈ Px, we have w(y) ≤ w(x) for every y ∈ Y (z). Thus, we can apply Lemma 6.1 to
each term on the left of (6.4), giving:∑z∈Px
w(x) · (2w(z)− w(Y (z))) ≤ w(x)2 .
Dividing by w(x) (recall that w(x) 6= 0) then gives the inequality stated in the Lemma.
Chapter 6. Algorithms for Strong k-Exchange Systems 99
We can now derive a bound on the locality ratio of Linear-k-Exchange. We defer a discussion
of the tightness of this bound until the next section.
Theorem 6.3. Let w be the weight function used by Linear-k-Exchange after rounding all the
weights down to the nearest integer multiple of ε2, and suppose that S is a locally optimal
solution with respect to the resulting potential function w2, and O is a global optimum with
respect to w. Then,
[w(S) ≥ 2
k + 1w(O)]
Proof. As we have noted, each of the exchanges (Px, Y (Px)) is a valid (k, r(k))-exchange. It
thus follows from the local optimality of S that
w2(Px) ≤ w2(Y (Px)) , (6.5)
for all x ∈ S \O. Moreover, (6.5) holds trivially in the case that x ∈ S ∩O since then we have
(S \ Y (Px)) ∪ Px = S. Thus (6.5) is valid for all x ∈ S. For each x ∈ S, Lemma 6.2 then gives
the inequality
w(x) ≥∑z∈Px
[2w(z)− w(Y (z))] .
Adding all |S| of these inequalities gives∑x∈S
w(x) ≥∑x∈S
∑z∈Px
[2w(z)− w(Y (z))] .
Since P is a partition of O, this is equivalent to∑x∈S
w(x) ≥∑z∈O
[2w(z)− w(Y (z))] . (6.6)
For any x ∈ S, w(x) ≥ 0 and there are at most k distinct z for which x ∈ Y (z), by property
(WK2) of Y . Thus ∑z∈O
w(Y (z)) ≤ k∑x∈S
w(x) = kw(S) .
Thus, (6.6) implies
w(S) ≥ 2w(O)− kw(S) ,
and the theorem follows.
Theorem 6.3 gives a bound on the locality ratio of Linear-k-Exchange in terms of the rounded
weight function. Now, we consider approximation ratio of Linear-k-Exchange with respect to
the instance’s given weight function, as well as the algorithm’s runtime. First, we bound the
number of improvements that Linear-k-Exchange can make.
Theorem 6.4. For any strong k-exchange (X, I), weight function w, and ε > 0, Linear-k-
Exchange is a(
2k+1 − ε
)-approximation running in time O(ε−2k3nk
2+3).
Chapter 6. Algorithms for Strong k-Exchange Systems 100
Proof. First, we show that Linear-k-Exchange is(
2k+1 − ε
)-approximation algorithm. Let w be
the weight function for the given instance and let bwc be rounded weight function:
bwc(x) =
⌊w(x)
ε2
⌋ε2 ,
used by Linear-k-Exchange. Let O = arg maxA∈I w(A) and O′ = arg maxA∈Ibwc(A). Then we
have
w(S) ≥ bwc(S) ≥ 2
k + 1bwc(O′) ≥ 2
k + 1bwc(O)
≥ 2
k + 1
∑x∈O
(w(x)− ε2) =2
k + 1(w(O)− |O|ε2)
≥ 2
k + 1(w(O)− nε2) =
2
k + 1w(O)− w(Sinit)ε ≥
(2
k + 1− ε)w(O) ,
where the second inequality follows from Theorem 6.3.
Now we consider the complexity of Linear-k-Exchange. In each iteration the algorithm
examines O(nk+r(k) = O(nk2+1) potential (k, r(k))-exchanges, and each of these exchanges can
be evaluated in time O(k2). We now bound the number of improvements the algorithm can
make. Let O = arg maxB∈I w2(B) and consider the solution Sinit produced by the initial
greedy phase of Linear-k-Exchange. From Theorem 5.4, (X, I) is k-extendible and so Greedy is
a k-approximation for maximizing w in (X, I). Moreover, if w(x) ≥ w(y) then w2(x) ≥ w2(y)
and so Sinit is also a greedy solution for the instance (X, I), w2. Thus:
kw2(Sinit) ≥ w2(O) .
Every time Linear-k-Exchange applies an exchange (A,R) to the current solution S, we must
have bwc2((S \ R) ∪ A) > bwc2(S). Because each weight bwc(x) is a multiple of ε2, each
improvement that Linear-k-Exchange applies must increase bwc2(S) by at least ε 22 . Thus, the
number of improvements made by Linear-k-Exchange is at most
bwc2(S)
ε 22
≤ w2(S)
ε 22
≤ w2(O)
ε 22
≤ kw2(Sinit)
ε 22
≤ kw(Sinit)2
ε 22
=kn2
ε2,
and so Linear-k-Exchange runs in time O(ε−2k3nk2+3).
Theorem 5.9 shows that the class of strong k-exchange systems is closed under contraction.
As the following theorem shows, we can apply the partial enumeration technique from Section
2.6 to remove the extra ε term from the approximation ratio, at a slight cost in the total
runtime of the algorithm. For technical reasons, we must assume that k ≥ 2. In practice, this
is no restriction, since in the case k = 1, the set system (X, I) is a matroid, and so the greedy
algorithm gives an optimal solution.
Chapter 6. Algorithms for Strong k-Exchange Systems 101
Theorem 6.5. Suppose that we set ε = 13n in Linear-k-Exchange. Then, for all k ≥ 2, Part-
Enum(Linear-k-Exchange) is a 2k+1 -approximation algorithm running in time O(k3nk
2+6).
Proof. Theorem 6.4 shows that Linear-k-Exchange is a(
2k+1 −
13n
)-approximation when ε is set
to 13n . Furthermore, every independent set in I has size at most n. Theorem 2.22 then shows
that the approximation ratio of PartEnum(Linear-k-Exchange) is
1
n+n− 1
n
(2
k + 1− 1
3n
)=
1
n+
2
k + 1− 2
n(k + 1)− 1
3n+
1
3n2
≥ 1
n+
2
k + 1− 2
3n− 1
3n+
1
3n2
≥ 2
k + 1,
where we have used k ≥ 2 in the first inequality.
The algorithm PartEnum(Linear-k-Exchange) makes n calls to Linear-k-Exchange. From The-
orem 4.12, each of these takes time O(ε−2k3nk2+3) = O(k3nk
2+5), so the total runtime is
O(k3nk2+6).
Finally, we note that in many specific settings the runtime of Linear-k-Exchange can be
drastically improved. We have assumed here that I is given as an independence oracle, and so
for any set A ⊆ X \ s the problem of finding a set R ⊆ S such that (A,R) is a valid (k, r(k))-
exchange requires enumerating over all possible sets R. In some cases, however, we may be able
to compute R directly, given A. For example, the setting of (k + 1)-claw free graphs, we can
easily determine the correct set R from the set A by examining the neighborhood of A. In this
case, the O(nr(k)) time used to find R can be replaced by the time required to determine the
neighborhood of a set of size at most k.
6.2 Monotone Submodular Maximization
We now consider the problem of maximizing a monotone submodular function in a strong
k-exchange system. Before presenting the general algorithm, we describe some of the difficul-
ties that arise when attempting to generalize the approach used in Linear-k-Exchange to the
submodular case.
An obvious difficulty is that in the monotone submodular case, we can no longer represent
f as a sum of weights. The non-oblivious potential function w2 made critical use of such a
representation. However, borrowing some intuition from the greedy algorithm, we might decide
to replace each weight w(x) with the marginal gain/loss associated with adding/removing x
from S. That is, at the start of each iteration of the local search algorithm, we assign each
element x ∈ X weight w(x) = f(S+x)− f(S−x), where S is the algorithm’s current solution,
then we proceed as before. Note that w(x) is simply the marginal gain fS(x) in the case that
x 6∈ S or the marginal loss fS−x(x) suffered by removing x from S, in the case that x ∈ S.
Chapter 6. Algorithms for Strong k-Exchange Systems 102
We define the non-oblivious potential function w2 in terms of the resulting weight function
w as before. The proof of Theorem 6.3 then goes through with only a slight change in the
locality ratio from 2k+1 to 2
k+3 . Unfortunately, the resulting local search algorithm may never
converge to a local optimum, as we show in the following example.
We let f be a simple, unweighted coverage function on the universe U = a, b, c, x, y, z.Specifically, let
S1 = a, b S3 = x, y
S2 = a, c S4 = x, z .
Our ground set X is then 1, 2, 3, 4 and our objective function f(A) =∣∣⋃
i∈A Si∣∣ for all A ⊆ X.
We consider the 2-exchange system with only 2 bases: P = 1, 2 and Q = 3, 4.For current solution S = P we have w(1) = w(2) = 1 and w(3) = w(4) = 2. Since
w2(1, 2) = 2 < 8 = w2(3, 4), the 2-replacement (3, 4, 1, 2) is applied, and the current
solution becomes Q. In the next iteration, we have S = Q, and w(1) = w(2) = 2 and w(3) =
w(4) = 1, so the 2-replacement (1, 2, 3, 4) is applied by the algorithm. This returns us to
the solution to P , where the process repeats indefinitely.
Intuitively, the problem with this approach is that the weight function used in each step
of the algorithm depends on the current solution S (since all marginals are taken with respect
to S). Hence, it may be the case that an exchange (A,R) results in an improvement with
respect to the current solution’s potential function, but in fact results in a decreased potential
value in the next iteration, after the weights have been updated. Surprisingly, we can solve the
problem by introducing even more variation in the potential function. Specifically, we allow
the algorithm to use a different weight function not only for each current solution S, but also
for each (k, r(k))-exchange (A,R) that is considered.
Before we give the full algorithm, let us describe the general approach used to generate the
weights used in our potential function. Rather than calculating all marginal gains with respect
to the set S, we consider elements in some order and assign each element a weight corresponding
to its marginal gain with respect to the set of elements that precede it. By carefully updating
both the current solution and this ordering each time we apply a local improvement, we can
ensure that the algorithm converges to a local optimum and still obtain the stated bounds on
the locality ratio.
The algorithm stores the current solution S as an ordered sequence s1, s2, . . . , s|S|. At each
iteration of the local search, before searching for an improving (k, r(k))-exchange, the algorithm
assigns a weight w(si) to each si ∈ S, as follows. Let Si = sj ∈ S : j ≤ i be the set containing
the first i elements of S. Then, the weight function w assigning weights to the elements of S is
given by
w(si) = f(Si−1 + si)− f(Si−1) = f(Si)− f(Si−1)
Chapter 6. Algorithms for Strong k-Exchange Systems 103
for all si ∈ S. Note that the weights w satisfy
∑si∈S
w(si) =
|S|∑i=1
f(Si)− f(Si−1) = f(S)− f(∅) = f(S) . (6.7)
In order to evaluate each (k, r(k))-exchange (A,R), we also need to assign weights to the
elements1 in A ⊆ X \ S. We use a different weight function for each k-replacement (A,R),
obtained as follows. We order A according to some arbitrary ordering ≺ on X and let ai be
the ith element of A and Ai = aj ∈ A : j ≤ i. Then, the weight function w(A,R) assigning
weights to the elements of A is given by
w(A,R)(ai) = f((S \R) ∪Ai−1 + ai)− f((S \R) ∪Ai−1) = f((S \R) ∪Ai)− f((S \R) ∪Ai−1)
for all ai ∈ A. Note that for every (k, r(k))-exchange (A,R),
∑x∈A
w(A,R)(ai) =
|A|∑i=1
(f((S \R) ∪Ai)− f((S \R) ∪Ai−1))
= f((S \R) ∪A)− f(S \R) ≥ f(S ∪A)− f(S) , (6.8)
where the inequality follows from the submodularity of f .
Since the function f is monotone submodular, all of the weights w and w(A,R) that we
consider will be nonnegative. Furthermore, the weights w assigned to elements in S remain
fixed for all k-replacements considered in a single phase of the algorithm. These facts play
a crucial role in our analysis. Finally, note that although both w and w(A,R) depend on the
current solution S, we omit this dependence from our notation, to avoid clutter. When there
is the possibility of confusion, we shall state explicitly which solution’s weight function we are
considering.
Our final algorithm, Submodular-k-Exchange is shown in Algorithm 11. It starts from an
initial solution Sinit = arg maxe∈X f(e), consisting of the singleton element of largest value.
When applying a valid (k, r(k))-replacement (A,R), the algorithm updates the ordered solution
S in a fashion that ensures all of the elements of S \R precede those of A, while all elements of
S \R and, respectively, A, occur in the same relative order. As we shall see in the next section,
this guarantees that the algorithm will converge to a local optimum. As in the linear case,
we use the sum of squared weights w2(R) =∑
b∈R w(b)2 and w2(A,R)(A) =
∑a∈Aw(A,R)(a)2 to
guide the search. Also, to ensure polynomial-time convergence, we adopt an approach described
by Arkin and Hassin [5] in the context of weighted k-set packing. We round all of our weights
down to the nearest integer multiple of ε3, thus ensuring that at each step we make an additive
1In order to simplify our analysis, we also consider here, implicitly, the case in which A ⊆ R, and so A ⊆X \(S\R). This will allow us to easily deal with the intersection S∩O in our analysis, even though the algorithmnever actually makes use of the resulting weights.
Chapter 6. Algorithms for Strong k-Exchange Systems 104
Algorithm 11: Submodular-k-Exchange
Input: Approximation parameter εStrong k-exchange system (X, I), given as an independence ora-cleMonotone submodular function f : 2X → R≥0, given as a valueoracle
Fix an arbitrary ordering ≺ on the elements of X;Let Sinit = arg maxe∈X f(e);Let ε3 = k+3
2n f(Sinit)ε;S ← Sinit;repeat
X ← ∅;for i = 1 to |S| do
w(si)←⌊f(X+si)−f(X)
ε3
⌋ε3;
X ← X + si;
foreach A ∈ X \ S with |A| ≤ k doforeach R ⊆ S with |R| ≤ k2 − k + 1 do
if (S \R) ∪A ∈ I thenLet ai be the ith element in A according to ≺;X ← S \R;for i = 1 to |A| do
w(A,R)(ai)←⌊f(X+ai)−f(X)
ε3
⌋ε3;
X ← X + ai;
if w2(A,R)(A) > w2(R) then
Delete all elements in R from S;Append the elements of A to the end of S, in the order given by ≺;break;
until no exchange is applied to S;return S;
improvement of at least ε3. Because of this rounding factor, we must actually work with the
following analogs of (6.7) and (6.8):
∑x∈S
w(x) ≤|S|∑i=1
(f(Si)− f(Si−1)) = f(S)− f(∅) = f(S) (6.9)
∑x∈A
w(A,R)(x) ≥|A|∑i=1
(f((S \R) ∪Ai)− f((S \R) ∪Ai−1)− ε3)
= f((S \R) ∪A)− f(S \R)− |A|ε3 ≥ f(S ∪A)− f(S)− |A|ε3 (6.10)
Chapter 6. Algorithms for Strong k-Exchange Systems 105
We now analyze the approximation and runtime performance of Submodular-k-Exchange.
The analysis follows closely the analysis of Linear-k-Exchange from Section 6.1. As there, we fix
some instance (X, I), f , and we let S be a locally optimal solution for this instance produced
by Submodular-k-Exchange and O is a globally optimal solution. Again, we only consider a
particular set of valid (k, r(k))-exchanges.
Consider the final iteration of the local search algorithm, in which no improvement for S
could be found, and let w be the weight function for S in this iteration. For each x ∈ S, we
define Px as in Section 6.1, using the weight function w. That is, for x ∈ S \O we let Px be the
set of all elements z ∈ O \S for which x ∈ Y (z) and for all y ∈ Y (z), w(x) ≥ w(y). Then, as in
Section 6.1, (Px, Y (Px) is a valid (k, r(k))-exchange for each x ∈ S \ O. Finally, as in Section
6.1 for each x ∈ S ∩O, we define Px = x and Y (x) = x. Note that
w(Px,Y (Px))(Px) = f(S)− f(S − x) = w(x)
for all such x (see footnote 1 on page 103).
Note that only weights for elements of S are considered in the definition of Px. Because
our algorithm assigns each element x ∈ S a single weight for the entire final iteration, Pxx∈Sremains a well-defined partition of O.
We now derive a bound on the locality ratio of Submodular-k-Exchange. The proof makes
use of Lemmas 6.1 and 6.2 from Section 6.1.
Theorem 6.6. Let S be the locally optimal solution produced by Submodular-k-Exchange when
applied to ε and the instance (I, X), f and let O be the optimal solution for this instance. Then,
f(S) ≥(
2
k + 3− ε)f(O)
Proof. For each (k, r(k))-exchange (Px, Y (Px)) the local optimality of S implies
w2Px,Y (PX)(Px) ≤ w2(Y (Px)) .
Furthermore, for x ∈ S ∩O, our definition of Px and Y (x) gives:
w2Px,Y (PX)(Px) = w2
Px,Y (PX)(x) = w2(x) = w2(Y (Px)) .
Thus, for each x ∈ S, the weight function on S ∪ Px that assigns each element x ∈ S weight
w(x) and each element z ∈ Px weight w(Px,Y (Px))(z) satisfies the conditions of Lemma 6.2, and
so we have:
w(x) ≥∑z∈Px
[2w(Px,Y (Px))(z)− w(Y (z))
],
Chapter 6. Algorithms for Strong k-Exchange Systems 106
for each x ∈ S. Adding all |S| of these inequalities, we obtain:∑x∈S
w(x) ≥∑x∈S
∑z∈Px
[2w(Px,Y (Px))(z)− w(Y (z))
]. (6.11)
From (6.9), we have ∑x∈S
w(x) ≤ f(S)
Additionally, from (6.10),∑z∈Px
w(Px,Y (Px))(z) ≥ f(S ∪ Px)− f(S)− |Px|ε3 .
Thus, (6.11) implies
f(S) ≥ 2∑x∈S
[f(S ∪ Px)− f(S)− |Px|ε3]−∑x∈S
∑z∈Px
w(Y (z)) . (6.12)
Now, we note that P is a partition of O. Thus,∑x∈S|Px| = |O| and
∑x∈S
∑z∈Px
w(Y (z)) =∑z∈O
w(Y (z)) .
Furthermore, Theorem 2.6 gives∑x∈S
[f(S ∪ Px)− f(S)] ≥ f(S ∪O)− f(S) .
Thus, (6.12) implies,
f(S) ≥ 2 (f(S ∪O)− f(S)− |O|ε3)−∑z∈O
w(Y (z)) . (6.13)
We have w(x) ≥ 0 for all x ∈ S, and from property (WK2) of Y , each x ∈ S appears in at most
k sets Y (z). Thus, ∑z∈O
w(Y (z)) ≤ k∑x∈S
w(x) ≤ kf(S) ,
where the last inequality follows from (6.9). This, together with
f(S) ≥ 2 (f(S ∪O)− f(S)− |O|ε3)− kf(S) .
rearranging this, we obtain
(k + 3)f(S) ≥ 2f(S ∪O)− 2|O|ε3 . (6.14)
The fact that f(S ∪O) ≥ f(O), which follows from the monotonicity of f , together with (6.14)
Chapter 6. Algorithms for Strong k-Exchange Systems 107
implies
f(S) ≥ 2
k + 3f(O)− 2
k + 3|O|ε3 .
Finally, we have |O| ≤ n, and so
2
k + 3|O|ε3 ≤
2n
k + 3ε3 = εf(Sinit) ≤ εf(O) .
Thus,
f(S) ≥(
2
k + 3− ε)f(O) .
We now consider the tightness of our bounds on the locality ratios of both Linear-k-Exchange
and Submodular-k-Exchange. Berman [11] gives a tight example of an unweighted (k + 1)-
claw free graph for which his local search algorithm has a locality ratio of 2k+1 . However, his
algorithm considers only a particular subset of valid (k, r(k))-exchanges and so the example
he gives is no longer locally optimal with respect to our algorithms’ neighborhoods. Hurkens
and Schrijver [55] show that the oblivious local search algorithm for unweighted independent
set in (k + 1)-claw-free graphs has a locality ratio of at most 2k+ε , where ε depends on the size
of the improvements considered by the algorithm. Specifically, they give a graph G in which
there are independent sets S and O such that |O| ≥ k+ε2 |S|, but S is locally optimal under
(t, t− 1)-exchanges. We now show how to use this result to obtain an almost-tight instance for
our analysis.
Let N(A,S) be the set of all vertices of S adjacent to some vertex of A, and let t = k.
Then, every valid (k, r(k))-exchange for S must have the form (A,N(A,S)) for some set A of
at most k independent vertices. The local optimality of S implies that |A| ≤ |N(A,S)| for all
such A, and so if we assign all vertices of G weight 1, then for all such A we have
w2(N(A,S)) = w(N(A,S)) = |N(A,S)| ≤ |A| = w(A) = w2(A) .
Thus, S is also a local optimum with respect to w2 in Linear-k-Exchange. We have
w(S) = |S| ≤ 2
k + ε|O| = 2
k + εw(O) ,
and so the locality ratio of Linear-k-Exchange is at most 2k+ε .
Now, suppose that we add a set C of |S| isolated vertices to G and define an objective
function f for the resulting instance by
f(D) =∑v∈D
w(D, v) ,
Chapter 6. Algorithms for Strong k-Exchange Systems 108
where
w(D, v) =
1, if v ∈ S ∪O
1, if v ∈ C and D ∩ S = ∅
0, if v ∈ C and D ∩ S 6= ∅ .
Then, we note that for all B ⊆ A and v 6∈ A we have
0 ≤ fA(v) ≤ fB(v) ,
and so f is monotone submodular. Now, we consider an arbitrary set A ⊆ O ∪ C of size at
most k. First, suppose that N(A,S) = S. Then, w(A,N(A,S))(v) = 1 for all v ∈ A and so
w2(A,N(A,S))(A) =
∑v∈A
12 = |A| = |N(A,S)| =∑
v∈N(A,S)
12 = w2S(N(A)) .
Now, suppose that N(A,S) ⊂ S. We have at least one element of S in S \ N(A,S), and so
w(A,N(A,S))(v) = 0 for all v ∈ A ∩ C. Thus,
w2(A,N(A,S))(A) =
∑v∈A∩O
12+∑
v∈A∩C02 = |A∩O| ≤ |N(A∩O,S)| ≤ |N(A,S)| =
∑x∈S
12 = w2S(N(A,S)) ,
and so S is a local optimum for Submodular-k-Exchange. But O ∪C is an independent set in G
and
f(O ∪ C) = |O|+ |C| = |O|+ |S| ≥ k + ε
2|S|+ |S| = k + 2 + ε
2f(S) .
Thus, the locality ratio of Submodular-k-Exchange is at most 2k+2+ε , and our bounds on the
locality gap of both Linear-k-Exchange and Submodular-k-Exchange are tight up to an additive
term of 1 in the denominator.
It remains to consider the runtime of Submodular-k-Exchange, which depends on the number
of local improvements (A,R) that it applies. We shall show Submodular-k-Exchange constantly
improves a bounded, global quantity, and so the total number of improvements that it can
make is bounded. Specifically, we show that although the weights w assigned to elements of the
current solution S change after each improvement is made, the non-oblivious potential w2(S)
is monotonically increasing.
Lemma 6.7. Suppose that Submodular-k-Exchange applies a (k, r(k))-exchange (A,R) to some
solution S to obtain a new solution T . Let wS be the weight function w : S → R≥0 determined
by solution S and wT be the weight function w : T → R≥0 determined by solution T . Then,
w2T (T ) ≥ w2
S(S) + ε 23 .
Proof. We first show that (1) wS(si) ≤ wT (si) for each element si ∈ S \R and (2) w(A,R)(ai) ≤wT (ai) for any element ai ∈ A, where w(A,R) is the weight function determined by S and the
(k, r(k))-exchange (A,R).
Chapter 6. Algorithms for Strong k-Exchange Systems 109
In the first case, let Si (respectively, Ti) be, the set of all elements in S (respectively T ) that
come before si (recall that S and T are stored as an ordered sequence) and Ai be the set of
all elements of A that come before ai in the arbitrary global ordering ≺. When the algorithm
updates the solution S, it removes all elements of R from S, appends all of A after S \R, and
leaves all elements of S \ R in the same relative order. Thus, Ti ⊆ Si. It follows directly from
the submodularity of f that
wS(si) =
⌊f(Si + si)− f(Si)
ε3
⌋ε3 ≤
⌊f(Ti + si)− f(Ti)
ε3
⌋ε3 = wT (x) , (6.15)
for each si ∈ S \R.
In the second case, let Ai be the set of all elements of A that come before ai (in the ordering
≺) and Ti be the set of all elements of T that come before ai. When the algorithm updates the
solution S, it removes all elements of R from S, places all elements of A after all those of S \R,
and leaves all elements of A in the same relative order. Thus, Ti ⊆ (S \ R) ∪ Ai and so from
the submodularity of f
w(A,R)(ai)=
⌊f((S\R) ∪Ai + ai)− f((S\R) ∪Ai)
ε3
⌋ε3 ≤
⌊f(Ti + ai)− f(Ti)
ε3
⌋ε3 = wT (ai) ,
(6.16)
for each ai ∈ A.
Finally, since Submodular-k-Exchange applied the improvement (A,R) to S, we must have
w2S(R) < w2
(A,R)(A), and since all weights are integer multiples of ε3, we must in fact have:
w2S(R) ≤ w2
(A,R)(A)− ε 23
From this inequality, together with (6.15) and (6.16), we have
w2S(S) = w2
S(S \R) + w2S(R)
≤ w2S(S \R) + w2
(A,R)(A)− ε 23
≤ w2T (S \R) + w2
T (A) + ε 23
= w2T (T ) + ε 2
3 .
We can now state our main result, which shows that Submodular-k-Exchange is a(
2k+3 − ε
)-
approximation and gives a bound on its runtime.
Theorem 6.8. For any strong k-exchange (X, I), monotone submodular function f , and ε > 0,
Submodular-k-Exchange is a(
2k+3 − ε
)-approximation algorithm running in deterministic time
O(ε−2k3nk2+4).
Proof. Theorem 6.6 implies that Submodular-k-Exchange is a(
2k+3 − ε
)-approximation algo-
rithm, provided that it eventually reaches a local optimum. We now show that it must terminate
in polynomial time.
Chapter 6. Algorithms for Strong k-Exchange Systems 110
Each iteration requires time O(n) to compute the weights for S, plus time to search for and
evaluate all potential (k, r(k))-exchanges. There are O(nk+r(k)) = O(nk2+1) possible (k, r(k))-
exchanges (A,R), and each one can be evaluated in time O(k + r(k)) = O(k2), including
the time to compute he weights w(A,R). Thus, the total runtime of Submodular-k-Exchange is
O(Ik2nk2+1), where I is the number of improvements it makes. We now bound I.
Because f is submodular, for any element e and any set T ⊆ X, we have f(T + e)− f(T ) ≤f(e) ≤ f(Sinit). In particular, for any solution S ⊆ X with associated weight function w, we
have
w2(S) =∑e∈S
w(e)2 ≤ |S|f(Sinit)2 ≤ nf(Sinit)2 .
Additionally, from Lemma 6.7, each improvement we apply must increase w2(S) by at least ε 23 ,
and hence the number I of improvements that Submodular-k-Exchange can make is at most
w2(S)− f(Sinit)2
ε 23
≤ nf(Sinit)2 − f(Sinit)2
ε 23
= (n− 1)
(f(Sinit)
ε3
)2
= O(n3ε−2) .
Hence, the total runtime of Submodular-k-Exchange is O(ε−2k2nk2+4).
As was the case with Linear-k-Exchange, we can employ partial enumeration to remove the
ε term from the approximation ratio for Submodular-k-Exchange.
Theorem 6.9. Suppose that we set ε = 12n in Submodular-k-Exchange. Then, PartEnum(Submodular-
k-Exchange) is a 2k+3 -approximation algorithm running in time O(k3nk
2+7).
Proof. Theorem 6.8 shows that Submodular-k-Exchange is a(
2k+3 −
12n
)-approximation when ε
is set to 12n . Furthermore, every independent set in I has size at most n. Theorem 2.22 then
shows that the approximation ratio of PartEnum(Submodular-k-Exchange) is
1
n+n− 1
n
(2
k + 3− 1
2n
)=
1
n+
2
k + 3− 2
n(k + 3)− 1
2n+
1
2n2
≥ 1
n+
2
k + 3− 1
2n− 1
2n+
1
2n2
≥ 2
k + 3,
where we have used k ≥ 1 in the first inequality.
The algorithm PartEnum(Submodular-k-Exchange) makes n calls to Submodular-k-Exchange.
From Theorem 4.12, each of these takes time O(ε−2k3nk2+4) = O(k3nk
2+6), so the total runtime
is O(k3nk2+7).
Chapter 7
Limitations of Oblivious Local
Search for CSPs
In the previous sections, we have demonstrated the power of non-oblivious local search, which
involves altering the potential function used to guide the local search algorithm. Now we turn
again to the generic local search algorithm GenLocalSearch presented in Chapter 1, and examine
the effect of changing other component functions. Specifically, in Section 7.1 we consider the
power of oblivious local search when large neighborhoods are used, and in Section 7.2 we consider
a variant of oblivious local search in which the initial solution Sinit is chosen randomly. We also
consider whether using a best improvement pivot rule together with either a random or greedy
initial solution gives any improvement.
Our results are formulated in the general setting of constraint satisfaction problems. A
constraint satisfaction problem (or CSP) Π consists of a domain D and a set Γ of relations on
D≤`, each of arity at most some constant `. An instance I of Π = (D,Γ) is then given by a
set V of n variables and a list of m constraints, each of the form (R, T ) where R ∈ Γ and T is
a tuple of arity(R) variables from V . The goal is to find an assignment σ, giving each variable
in V a value in D, that maximizes the number of constraints (R, T ) ∈ I for which R(σ(T )) is
true. We say that such constraints are satisfied by σ.
Here, we consider the special case in which all variables are Boolean, so D = 0, 1. We
shall consider weighted CSPs, in which each constraint C has an associated weight w(C) in R≥0,
and the problem is to find an assignment σ that satisfies constraints of maximum total weight.
For particular CSPs, it may by possible to prove analogs of our results for the unweighted case
as well, but we consider only the general weighted case here. To simplify our analysis, we allow
an instance to contain multiple copies of a single constraint (R, T ). Such constraints can easily
be replaced by a single, appropriately weighted constraint, so this is without loss of generality.
Two examples of Boolean CPSs that we shall consider in more detail in this section are
MaxCut, and Max-k-Sat. In MaxCut, we are given an edge-weighted graph G = (V,E) and seek
a partition (S, V \S) of the vertices of V that maximizes the total weight of the edges with one
111
Chapter 7. Limitations of Oblivious Local Search for CSPs 112
endpoint in S and one endpoint in V \ S. MaxCut can be represented as a Boolean CSP whose
constraints are all non-equalities of the form u 6= v, each corresponding to an edge (u, v) ∈ E.
Then, the set of variables from V assigned the value 1 form one side of the partition (S, V \ S)
and the set assigned the value 0 form the other side.
In Max-k-Sat, we are given a collection of clauses, over some variable set V , each containing
exactly1 k literals over V and seek an assignment to the variables of V that satisfies a set
of clauses of maximum total weight. Max-k-Sat is easily captured by a CSP in which each
constraint is a disjunction of exactly k literals over V . Note that here we incorporate the
notion of a literal, which is either a variable or its negation, into the definition of the set of
constraints.
We focus on Boolean CSPs because they are well-studied and can be formulated naturally
as the sort of combinatorial optimization problems we have considered in previous chapters.
We let the ground set X be the set of variables V , and the set of feasible solutions contain all
subsets of V . We identify a subset S ⊆ V with the assignment σS that assigns every variable
in S the value 1, and every variable not in S the value 0. Then, we seek to find a set whose
corresponding assignment satisfies constraints of maximum total weight. That is, we consider
the (trivial) independence system (V, 2V ), and seek to maximize the objective function
f(S) = w( (R, T ) ∈ I : R(σS(T )) ) .
With this correspondence in mind, we shall implicitly identify a set of variables S ⊆ V
with the assignment σS for the rest of this chapter. Specifically, we shall refer to S ⊆ V
as an assignment, and use terminology for assignments and sets of variables interchangeably.
Additionally, for a CSP instance I, we denote by f(I, S) the total weight of the constraints in
I satisfied by σS . We define
f∗(I) = maxS⊆V
f(I, S) .
For any two sets A,B, we denote the symmetric difference (A \ B) ∪ (B \ A) by A 4 B. If
S ⊆ V is an assignment and A ⊆ V is some set of variables, then S 4 A is the assignment
obtained from S by flipping (the values assigned to) each variable in A and leaving the values
of other variables in V unchanged. Using this notation, it is possible to formulate a natural,
oblivious local search algorithm for constraint satisfaction problems, that repeatedly attempts
to improve the current assignment S by flipping some bounded number of variables.
For a function h : N→ N, we consider the oblivious h(n)-local search algorithm, h(n)-Local-
Search, shown in Algorithm 12. At each step, h(n)-LocalSearch flips at most h(n) variables,
and terminates when no such change improves the total weight of the constraints satisfied by
S. That is, h(n)-LocalSearch finds an assignment S such that f(I, S) > f(I, S4A) for all A of
size at most h(n). We call such an assignment S an h(n)-local optimum. As all of the results
1Sometimes this variant is called exact Max-k-Sat to distinguish it from the case in which a clause is allowedto have at most k literals per clause.
Chapter 7. Limitations of Oblivious Local Search for CSPs 113
Algorithm 12: h(n)-LocalSearch
Input: CSP Instance ILet Sinit be some subset of V ;S ← Sinit;repeat
foreach C ⊆ V with |C| ≤ h(n) doif f(I, S 4 C) > f(I, S) then
S ← S 4 C;break
until no change is made to S;return S;
of this chapter are negative, we do not concern ourselves with the runtime of Algorithm 12.
Indeed, all of our results in this chapter are information-theoretic, and hold independent of any
complexity assumptions.
Note that in Algorithm 12 we do not specify how to choose Sinit. In Section 7.1, we shall
examine the worst-case locality ratio of h(n)-LocalSearch, which does not depend directly on the
choice of Sinit. Specifically, we examine the relationship between the locality ratio of h(n)-Local-
Search and the function h(n). In Section 7.2, we examine the worst-case expected approximation
performance of h(n)-LocalSearch when Sinit is chosen as a uniform random subset of V .
7.1 Large Neighborhoods
Our first set of results involves the dependence of locality ratio of h(n)-LocalSearch on the
function h(n). Specifically, we examine whether the locality gap of h(n)-LocalSearch can be
significantly improved by increasing the size of the neighborhoods that it considers. In Theorem
3.2 we considered this question in the context of maximum coverage subject to a matroid
constraint, and gave an instance in which any oblivious local search algorithm with a constant-
sized neighborhood has a locality ratio only O(r−1) larger than that of 1-local search, where r
is the rank of the matroid. Now, we prove a similar result in the general context of Boolean
CSPs.
We consider instances and assignments that satisfy the following property.
Definition 7.1 ((Y, Z)-robust assignment). Consider an instance I of a Boolean CSP Π =
(0, 1,Γ) and an assignment S ⊆ V for this instance. Let Y, Z ⊆ V . We say that S is (Y,Z)-
robust for the instance I if any assignment S′ with f(I, S′) > f(I, S) must differ from S by at
least 1 element of Y and one element of Z (i.e. (Y 4 S′) 6= ∅ and (Z 4 S′) 6= ∅).
Clearly, any assignment S that is (Y, Z)-robust for an instance I is also a 1-local optimum
of I. Thus, the locality ratio of 1-LocalSearch is at most f(I, S)/f∗(I), where S is any (Y,Z)-
robust assignment for I. We shall show that this bound holds asymptotically for all o(n)-local
Chapter 7. Limitations of Oblivious Local Search for CSPs 114
search.
Let Π be a Boolean CSP and suppose that I is an instance of Π with a (Y,Z)-robust
assignment S. We shall use I and S to construct an infinite family of related instances I(k, d)
where k and d are positive integers. Because our construction applies to any Boolean CSP it is
necessarily abstract. In order to help the reader follow the construction, we provide a concrete,
running example of its application to MaxCut. We depict the MaxCut instances as graphs, and
depict an assignment S (corresponding to the cut (S, V \S)) by using double circles for vertices
that are in S and single circles for vertices in V \ S. We label edges with their weights, leaving
edges with weight 1 unlabeled in order to avoid clutter.
The starting point for our example is shown in Figure 7.1. It depicts a small instance I of
MaxCut with a (Y,Z)-robust assignment S = p, a, where Y = a, b and Z = p, q. Note
that f(I, S) = 2, and the only assignments that satisfy more constraints are a, b and p, q,each of which satisfy all of the constraints of I. Both of these assignments differ from S by one
of a, b and one of p, q, so S is indeed an (a, b, p, q)-robust assignment.
a
q
b
p
Figure 7.1: I, with (a, b, p, q)-robust assignment S
Our results hold for CSPs Π = (0, 1,Γ) that are non-trivial in the following sense: there
must exist some relation P ∈ Γ that is not satisfied by the all-zeroes tuple 0arity(P ) and some
relation N ∈ Γ that is not satisfied by the all-ones tuple 1arity(N).
Suppose that Π is not trivial, and consider the relations P and N described above. We
shall define two small CSP instances, IP which contains a single constraint using the relation
P and IN , which contains a single constraint using the relation N . The purpose of IP is to
penalize moves that flip some variable from 1 to 0 (or, in our set notation for assignments,
remove some variable from S). Similarly, the purpose of IN will be to penalize moves that flip
some variable from 0 to 1 (or, in our set notation, add some variable to S). We shall use several
copies of these single-constraint instances to make a particular assignment S preferable to its
neighboring assignments.
Formally, let VP be a set of arity(P ) distinct variables, and T be a tuple containing each
variable in VP once. Then, consider the instance IP comprising a single constraint (P, T ).
There must be some assignment S ⊆ VP and some variable x ∈ S so that (P, T ) is satisfied by
S but not by S − x. We denote this assignment by SP and this variable by xP .
Similarly, let VN be a set of arity(N) distinct variables, and T be a tuple containing each
Chapter 7. Limitations of Oblivious Local Search for CSPs 115
variable in VN once. Then, consider the instance IN comprising the single constraint (N,T ).
There must be some assignment S ⊂ VN and some variable x ∈ VN \S so that (N,T ) is satisfied
by S but not by S + x. We denote this assignment by SN and the variable x by xN .
Examples of the two instances IP and IN for MaxCut with corresponding assignments SP
and SN and distinguished variables xP and xN are depicted in Figures 7.2 and 7.3, respectively.
Note that SP satisfies the single constraint of IP but SP − xP does not. Similarly, SN satisfies
the single constraint of IN , but SN + xP does not.
xP
s
Figure 7.2: IP , with assignment SP
xN
t
Figure 7.3: IN , with assignment SN
We now return to the general construction. Let δ = f∗(I) − f(I, S). For each pair of
positive integers (k, d) we construct an instance I(k, d) of Π by combining several copies of
the constraints from I, IP and IN as follows. We shall refer to Figure 7.4, which provides an
example of the construction for k = 3 and d = 4 using the instances from Figures 7.1, 7.2, and
7.3. Note that in Figure 7.1 we have δ = 2.
• For each i ∈ [d], let Ii be a copy of I, in which each variable z ∈ Z has been replaced
by a fresh variable zi, unique to Ii and the other variables are the same as in I. We add
to I(k, d) all the constraints from each copy Ii. All of these constraints have the same
weight as their corresponding constraints in I. These constraints appear in the top half
of Figure 7.4.
• For each j ∈ [k] and each variable y ∈ Y ∩S, let Iy,j be a copy of IP , in which the variable
xP has been replaced by y and each variable v ∈ VP − xP has been replaced by a fresh
variable vy,j , unique to Iy,j . We add to I(k, d) the single constraint from each copy Iy,j .Each of these constraints has weight δ. These are the 3 constraints in the lower right part
of Figure 7.4.
• For each j ∈ [k] and each variable y ∈ Y \S, let Iy,j be a copy of IN , in which the variable
xN has been replaced by y and each variable v ∈ VN − xN has been replaced by a fresh
variable vy,j , unique to Iy,j . We add to I(k, d) the single constraint from each copy Iy,j .Each of these constraints has weight δ. These are the 3 constraints in the lower left part
of Figure 7.4. p
We combine the assignments S, SP , and SN into a single assignment S(k, d) for I(k, d) in the
following, natural fashion. Each copy of a variable x is assigned the same value by S(k, d) as x
Chapter 7. Limitations of Oblivious Local Search for CSPs 116
p1 q1p2 q2p3 q3p4 q4
2
sa,1
2
tb,1
2
sa,2
2
tb,2
2
sa,3
2
tb,3
a b
Figure 7.4: Instance I(3, 4) with assignment S(3, 4)
is assigned by S, SP , or SN . Note that only the variables from Z appear in 2 different copies,
and the construction ensures that these variables are assigned the same value by all of the
relevant assignments S, SP , and SN . The assignment S(3, 4) for our example MaxCut instance
I(3, 4) is shown in Figure 7.4.
Finally, we consider the number of variables n(k, d) in the instance I(k, d). Let V be the
set of variables from I and note that both |VN | and |VP | are at most ` since arity(P ) ≤ ` and
arity(N) ≤ `. We have
n(k, d) ≤ d|Z|+ |V \ Z|+ k(`− 1)|Y | = |V |+ (d− 1)|Z|+ k(`− 1)|Y | = Θ(d+ k) (7.1)
The following lemma shows that S(k, d) is a k-local optimum of I(k, d).
Lemma 7.2. Suppose that S is a (Y, Z)-robust assignment for the instance I. Then, for every
k, d ≥ 1, the assignment S(k, d) is a k-local optimum of I(k, d).
Proof. We consider f(I(k, d), S(k, d)) and show that it is impossible to increase f by flipping at
most k variables. The assignment S(k, d) already satisfies all of the constraints from the copies
Iy,i of IP and IN so in order to increase f , we must increase the total weight of the satisfied
constraints in some copy Ii of I. Because I is (Y,Z)-robust, this requires flipping at least one
variable y ∈ Y and at least one variable zi from Ii. We can satisfy constraints of total weight at
most f∗(I) from each copy Ii of I. Thus, we can satisfy constraints of total additional weight
at most
f∗(I)− f(I, S) = δ ,
for each such variable zi that is flipped.
However, flipping y unsatisifies the single constraint in the copy Iy,j of IP or IN for each
j ∈ [k]: recall that each of these instances was constructed by either letting xP = y in IP if
y ∈ S or xN = y in IN if y 6∈ S; in either case, flipping the variable y unsatisfies the single
constraint from IP or IN . Each of these k constraints has weight δ, so flipping y decreases f
Chapter 7. Limitations of Oblivious Local Search for CSPs 117
by kδ. The only way to possibly re-satisfy the constraint from Iy,j is by flipping at least one
other variable vy,j unique to Iy,j . Thus, we can re-satisfy clauses of total weight at most δ for
each variable vy,j that is flipped.
Suppose that we flip at least one variable y ∈ Y , together with a variables zi and b variables
vy,j . Then, the total change in the weight of satisfied clauses is at most
−kδ + aδ + bδ ≤ −δ ,
since a + b ≤ k − 1. Thus, we must actually decrease f . The above argument can easily be
adapted to show that S(k, d) is in fact a (k+ 1)-local optimum for I(k, d). However, this small
distinction will be irrelevant in our remaining proofs.
Using Lemma 7.2, we can derive the following bounds on locality ratio of h(n)-LocalSearch
search on the CSP Π. Recall that in Section 1.2, we defined the locality ratio for a problem
to be the infimum of the locality ratios of all of its instances. Thus, we ignore any factors of
magnitude o(n) in our discussion of locality ratios.
Theorem 7.3. For any h = o(n), the locality ratio of h(n)-LocalSearch for a (non-trivial) CSP
Π is at most f(I, S)/f∗(I), where I is an instance of Π and S is a (Y,Z)-robust assignment
for I.
Proof. Let I be some instance of Π and S be a (Y, Z)-robust assignment for I. It suffices to
show that for every ε > 0, there is an instance I(k, d) such that S(k, d) is an h(n)-local optimum
for I(k, d) andf(I(k, d), S(k, d))
f∗(I(k, d))≤ f(I, S)
f∗(I)+ ε .
Let δ = f∗(I)− f(I, S), and let O be an optimal assignment for I. Then, for any k, d ≥ 1,
we define the assignment O(k, d) for the variables V of I(k, d) as follows: O(k, d) assigns each
variable from a copy of I the same value as O, and each variable xy,j , unique to the instance
Iy,j , an arbitrary value. Then, O(k, d) satisfies constraints of weight f(I, O) = f∗(I) from each
copy Ii of I, and hence we have
f(I(k, d), O(k, d)) ≥ df∗(I) . (7.2)
On the other hand, S(k, d) satisfies constraints of weight f(I, S) from each copy Ii of I, as
well as the single constraints from each of the k|Y | instances Iy,j , all of weight δ. Thus,
f(I(k, d), S(k, d)) = df(I, S) + k|Y |δ . (7.3)
We set
k = k(d) =
⌊εf∗(I)
|Y |δ
⌋d ,
Chapter 7. Limitations of Oblivious Local Search for CSPs 118
and note that k = Θ(d). Then, we consider the resulting family of instances I(k, d) for d ≥ 1.
From (7.1), the number of variables n = n(d) in I(k, d) is Θ(d+k) = Θ(d). We have h(n) = o(n),
and since n = Θ(d) and k = Θ(d), we must have h(n) ≤ k for all sufficiently large d. Thus, from
Lemma 7.2 S(k, d) is an h(n)-local optimum of I(k, d) for all sufficiently large d. Furthermore,
from (7.2) and (7.3), we have
f(I(k, d), S(k, d))
f∗(I(k, d))≤ f(I(k, d), S(k, d))
f(I(k, d), O(k, d))≤ df(I, S) + k|Y |δ
df∗(I)≤ f(I, S)
f∗(I)+ ε
Theorem 7.3 shows that even neighborhoods of super-polynomial size cannot improve the
locality ratio of h(n)-LocalSearch beyond the ratio attained by some (Y,Z)-robust assignment.
We can easily adapt the proof to show that the locality ratio of h(n)-LocalSearch remains less
than 1 even when we allow mildly exponential neighborhoods. In this case, we allow h(n)-
LocalSearch to flip up to some constant fraction of the current assignment’s variables in each
iteration.
Theorem 7.4. Suppose that I is an instance of a CSP Π and that there exists a (Y, Z)-robust
assignment S for I that is not a global optimum of I. Then there exists an absolute constant
c such that the locality ratio of h(n)-LocalSearch is less than 1 whenever h(n) ≤ cn for all
sufficiently large n.
Proof. We use the same argument as in the proof of Theorem 7.3, setting ε to be any positive
constant satisfying
ε < 1− f(I, S)
f∗(I).
Note that since S is not a global optimum, such an ε must exist.
We set k = k(d) as in the proof of Theorem 7.3 and consider the instance I(k, d) with
assignment S(k, d). Again, we have n = n(d) = Θ(d) and k = Θ(d), so there must exist some
absolute constant c such that cn ≤ k for all sufficiently large n. Suppose also that h(n) ≤ cn
for all sufficiently large n. Then, from Lemma 7.2, S(k, d) is a h(n)-local optimum for I(n, k)
for all sufficiently large n. Furthermore, as in the proof of Theorem 7.3, we have
f(I(k, d), S(k, d))
f∗(I(k, d))≤ f(I, S)
f∗(I)+ ε < 1
Our running example based on Figure 7.1 shows that the locality ratio of h(n)-LocalSearch
on MaxCut remains at most 1/2 for all h = o(n). We now give some additional examples of
CSP instances with (Y,Z)-robust assignments.
We begin with Max-2-Sat. Khanna et al. [64] show that the locality ratio of h(n)-LocalSearch
on this problem is at most 2/3 for any h = o(n) by giving an explicit construction. We can
derive the same result by exhibiting the Max-2-Sat instance
x ∨ a x ∨ b a ∨ b ,
Chapter 7. Limitations of Oblivious Local Search for CSPs 119
and observing that the assignment S = a, b is (a, b, x)-robust for this instance: it satisfies
2 clauses, but we must flip x and either a or b to satisfy all 3 clauses. Thus the locality ratio
of h(n)-LocalSearch on Max-2-Sat is at most 2/3 for all h = o(n).
Moreover, this can be extended to Max-k-Sat for k > 2 as follows. Let Vk−2 be a set of k−2
fresh variables and let Ck−2 be the set of all 2k−2 clauses on Vk−2. We consider the instance
x ∨ a ∨ C : C ∈ Ck−2 ∪ x ∨ b : C ∈ Ck−2 ∪ a ∨ b : C ∈ Ck−2 .
The assignment S = a, b is (a, b, x)-robust for this instance, as well: it satisfies all but
one of the 3 · 2k−2 clauses, but we must flip x and either a or b to satisfy all of the clauses. By
Theorem 7.3, the locality ratio of h(n)-LocalSearch on Max-k-Sat is at most
3 · 2k−2 − 1
3 · 2k−2
for all h = o(n). Note that this is worse than the expected approximation ratio of a uniform
random assignment, which is2k − 1
2k=
4 · 2k−2 − 1
4 · 2k−1.
In contrast, Khanna et al. [64] give a non-oblivious local search algorithm with a locality ratio
of 2k−12k
for all k.
Finally, we consider the Max-k-CCSP problem, in which we all constraints are conjunctions
of exactly k literals. Alimonti [3, 4] shows that oblivious h(n)-local search, cannot approximate
this problem to within any constant factor for any function h(n) = O(1). For k ≥ 2, we can
derive the same result for all h = o(n) by exhibiting the Max-k-CCSP instance consisting of the
single conjunctionk∧i=1
xi .
The assignment S = ∅ is (x1, x2)-robust for this instance: it does not satisfy the single
conjunction, and to satisfy the conjunction we must flip both x1 and x2. From Theorem 7.3,
the locality ratio of h(n)-LocalSearch on Max-k-CCSP is then 0 for any h = o(n). In contrast,
Alimonti [4] shows that non-oblivious 1-local search has a locality ratio of 2/5 for Max-2-CCSP.
Furthermore, Khanna et al. [64] give a non-oblivious local search algorithm with a locality ratio
of 12k
even for the more general Max-k-CSP problem, in which an instance can have arbitrary
constraints of arity k.
7.2 Random and Greedy Initial Solutions and Best Improve-
ment Pivot Rules
All of our results thus far have been stated in terms of various algorithms’ locality ratios. As
we discussed in Chapter 1, the locality gap provides a lower bound on the approximation ratio
Chapter 7. Limitations of Oblivious Local Search for CSPs 120
attained by a local search algorithm. This bound is tight under the assumption that the initial
solution Sinit is chosen by an adversary. By basing our analysis on the locality gap, we avoid
the difficulties of analyzing the dynamic behavior of the algorithm, while still delivering an
absolute guarantee on its performance.
In real applications, however, the solution Sinit is not chosen by an adversary, and so it
is unclear if this guarantee is the best possible. Some clever or even random choice of Sinit,
combined with some clever way of choosing amongst possible improvements at each stage might
guarantee that a local search algorithm avoids particularly poor local optima. For example,
Arkin and Hassin [5] show that the locality ratio of any local search algorithm for weighted
k-set packing that brings in t sets and discards t− 1 sets from the current solution is at most1
k−1+ 1t
, but Chandra and Halldorsson [22] give an oblivious local search algorithm attains an
approximation ratio of 32(k+1) . Chandra and Halldorsson’s algorithm beats the locality gap
by choosing an initial solution using the greedy algorithm and at each step choosing the local
improvement that increases the objective function the most.
A popular heuristic approach to improving the performance of local search algorithms in-
volves random restarts. Here, the algorithm chooses a random initial solution Sinit, and we
repeat the local search algorithm several times using different random starting solutions.
In this section we examine related variants of the algorithm h(n)-LocalSearch. We consider a
randomized variant of Algorithm 12 in which Sinit is chosen uniformly at random from among
all 2n possible solutions. We call this algorithm randomly initialized h(n)-LocalSearch. If a
problem’s instances possess a very small number of isolated, poor local optima, the expected
performance of this randomized local search algorithm would grow much larger than the locality
gap as the size n = |V | of an instance grows larger. We shall show that this is not the case
for the problem MaxCut; on the contrary, there exists an infinite family of MaxCut instances
in which the expected performance of the randomized local search algorithm remains relatively
poor.
Algorithm 12 searches through all sets C of size at most h(n) for a valid improvement, and
applies the first one that it encounters. Inspired by Chandra and Halldorsson’s algorithm, we
also consider a variant of h(n)-LocalSearch that returns the best improvement found at each step.
This algorithm is easily obtained from Algorithm 12 by removing the break statement from
the loop. The resulting algorithm will then search through all possible improvements, keeping
only the best one found. We additionally modify the loop to search through improvements C in
order of increasing size, so that if several improvements give the same increase in the objective
function, a set C that moves the fewest number of vertices will be chosen. We call the resulting
algorithm h(n)-LocalBestImpSearch.
We present analyses for the randomly initialized variants of both h(n)-LocalSearch and h(n)-
LocalBestImpSearch. Our results for h(n)-LocalSearch do not depend on which improvement
the algorithm chooses; in fact, we allow the algorithm to use an arbitrary oracle to choose
which improvement to apply. Our bounds on the expected approximation performance of
Chapter 7. Limitations of Oblivious Local Search for CSPs 121
randomly initialized h(n)-LocalSearch therefore hold for any h(n)-local search algorithm and
require no further complexity assumptions. In the case of h(n)-LocalBestImpSearch, we can
make use of the fact that the algorithm must choose the best possible improvement at each
stage to obtain improved inapproximation bounds. One interesting question is whether the
general results for h(n)-LocalSearch can be improved to match these results, or, if not, whether
some particular rule for choosing improvements at each stage does, in fact, give improved
approximation performance. Finally, we show that our results also hold if Sinit is chosen by a
general greedy-like, priority algorithm.2
We now turn to the proof of our results. We use the same conventions as in the previous
section, but, since we are now considering only the special case of MaxCut, we make use of
some specific terminology to simplify the presentation. We define S = V \ S and shall refer to
a set S of variables as a cut, meaning the partition (S, S). Note that the sets S and S give the
same cut (S, S). Flipping a variable v now corresponds to moving the vertex v across the cut to
the other side of the partition. We thus talk about moving vertices across the cut rather than
flipping variables. Finally, we say that S cuts an edge (u, v) if exactly one of u or v is in S.
Our proofs make use of an infinite family of worst-case instances Ga,r, depicted in Figure
7.5. For each a ≥ 2 and 1 ≤ r < a, the graph Ga,r on a + 2 vertices is constructed as follows.
Let A be a set of a vertices, and B be a set of 2 vertices. We form a complete bipartite graph
by inserting an edge of weight one between each vertex of A and each vertex of B, then add
one additional edge of weight r connecting the two vertices of B.
In order to analyze the performance of h(n)-local search on Ga,r, we consider the value of an
arbitrary cut S. We consider the following three classes of cuts, distinguished by the position
of the vertices of B:
Case (|S ∩B| = 0):
In this case, both vertices of B are in the set S and so the edge of weight r between them
is not cut by S. Each vertex of A that is in S has both of its edges to B cut by S. The
remaining vertices of A are in S and so have none of their edges cut by S. Thus,
f(S) = 2|S ∩A| .
Case (|S ∩B| = 1):
We have exactly one vertex of B in S, so the cut S separates the vertices of B, and cuts
the edge of weight r between them. Each vertex of A has one edge of weight 1 to each of
the vertices of B. One of these vertices must be on the same side of the cut as A and one
must be on the opposite side. Thus, exactly one of the weight 1 edges incident to each
vertex of A is cut, and we have
f(S) = a+ r .
2The class of priority algorithms was first introduced by Borodin, Nielsen, and Rackoff [14]. Davis andImpagliazzo [32] and Borodin et al. [13] provide analyses of priority algorithms in the context of graph problems.
Chapter 7. Limitations of Oblivious Local Search for CSPs 122
· · ·
r
Set of a vertices A
Set of 2 vertices B
Figure 7.5: The graph Ga,r
Case (|S ∩B| = 2):
This case is the same as the case |S ∩ B| = 0. By considering the cut S in the first case
above, we obtain
f(S) = 2|S ∩A| = 2a− 2|S ∩A|) .
The unique, globally optimal cut of Ga,r is the cut A (or, equivalently, V \A = B) with value
f(A) = 2a. We have just shown that the cuts that separate the vertices of B have reasonably
poor quality. The following lemma shows that many of these cuts are also h(n)-local optima in
Ga,r. As we shall show, the parameter r governs how unbalanced a cut can be with respect to
the vertices of A and still be a local optimum.
Lemma 7.5. Consider a graph Ga,r, where 2h(n) < r < a, and a cut S with
||S ∩A| − a/2| ≤ r/2− h(n) .
If furthermore |S ∩B| = 1 then S is an h(n)-local optimum, with f(S) = a+ r.
Proof. First, we note that since |S ∩B| = 1 we must have f(S) = a+ r. Let C be an arbitrary
subset of V of size at most h(n), and let S′ = S 4C. We show that for each such set we must
have f(S) ≥ f(S′). We consider 2 cases, based on how many vertices of B are contained in C:
Case (|C ∩B| 6= 1):
After moving both or none of the vertices of B across the cut, we still have |S′ ∩B| = 1,
and so f(S′) = a+ r, regardless of the positions of the other vertices.
Chapter 7. Limitations of Oblivious Local Search for CSPs 123
Case (|C ∩B| = 1):
After moving one vertex of B across the cut, we have either |S′ ∩B| = 2 and so
f(S′) = 2a− 2|S′ ∩A|) ,
or |S′ ∩B| = 0 and so
f(S′) = 2|S′ ∩A| .
From the conditions of the lemma
a/2− (r/2− h(n)) ≤ |S ∩A| ≤ a/2 + (r/2− h(n)) ,
and so after moving at most h(n)− 1 < h(n) vertices of A across the cut we have
a/2− r/2 < |S′ ∩A| < a/2 + r/2 .
Thus, both 2|S′ ∩ A| < a + r and 2a − 2|S′ ∩ A| < 2a − (a − r) = a + r and so we have
f(S′) < a+ r.
Under a slightly strengthened assumption about the placement of the vertices of A, we
can derive the following stronger result, which shows that many cuts that do not satisfy both
conditions of Lemma 7.5 will satisfy them after at most one greedy improvement.
Lemma 7.6. Consider a graph Ga,r, where 2h(n) < r < a, and a cut S with
||S ∩A| − a/2| ≤ r/2− h(n)
3.
After at most one greedy improvement C the cut S′ = S 4 C satisfies both
||S′ ∩A| − a/2| ≤ r/2− h(n)
and |S′ ∩B| = 1.
Proof. First, we note that since r > 2h(n), we have r/2−h(n)3 ≤ r/2−h(n). Suppose |S∩B| = 1.
Then, S already satisfies both conditions of the lemma since
||S ∩A| − a/2| ≤ r/2− h(n)
3≤ r/2− h(n) .
Then, suppose that |S ∩ B| 6= 1, and so both vertices of B are on the same side of the cut S.
Let C be an improvement of size at most h(n). We consider 2 cases for C:
Case (|C ∩B| 6= 1):
We must have |S′ ∩B| 6= 1, since either both vertices of B are are moved across the cut,
Chapter 7. Limitations of Oblivious Local Search for CSPs 124
or both are left where they are. Thus, we have either
f(S′) = 2|S′ ∩A|
or
f(S′) = 2a− 2|S′ ∩A| .
Since
a/2− r/2− h(n)
3≤ |S ∩A| ≤ a/2 +
r/2− h(n)
3,
after moving at most h(n) vertices of A, we must have
a/2− r/2− h(n)
3− h(n) ≤ |S′ ∩A| ≤ a/2 +
r/2− h(n)
3+ h(n) ,
and so
2|S′ ∩A| ≤ 2(a/2 + r/6 + 2h(n)/3)
= a+ r/3 + 4h(n)/3
< a+ r/3 + 2r/3
= a+ r ,
and also
2a− 2|S′ ∩A| ≤ 2a− 2(a/2− r/6− 2h(n)/3)
= 2a− a+ r/3 + 4h(n)/3
< a+ r/3 + 2r/3
= a+ r .
In either case, we have f(S′) < a+ r.
Case (|C ∩B| = 1):
We must have |S′ ∩ B| = 1, since one vertex of B is moved across the cut. Therefore,
f(S′) = a+ r, regardless of the position of the other vertices.
Thus, any improvement C with |C ∩ B| = 1 results in a strictly greater value of f(S′) than
any improvement with |C ∩ B| 6= 1. Furthermore, all improvements C with |C ∩ B| = 1 yield
the same value for f(S′). The algorithm h(n)-LocalBestImpSearch must pick one having the
smallest size amongst all such improvements. This is an improvement C of size 1 containing
only a single vertex of B. After applying this improvement to S, we have a cut S′ satisfying
Chapter 7. Limitations of Oblivious Local Search for CSPs 125
both |S′ ∩B| = 1 and
||S′ ∩A| − a/2| = ||S ∩A| − a/2|
≤ r/2− h(n)
3
≤ r/2− h(n) .
Lemmas 7.5 and 7.6 both require that close to half of the vertices of A be included in a S.
We now consider the probability that this is the case when S is chosen uniformly at random.
A straightforward Chernoff bound gives:
Pr[||A ∩ S| − a/2| ≤ r/2− h(n)] ≥ 1− 2e−(r−2h(n))2/2a , (7.4)
and
Pr[||A ∩ S| − a/2| ≤ r/6− h(n)/3] ≥ 1− 2e−(r−2h(n))2/18a . (7.5)
Using these bounds and the results of Lemmas 7.5 and 7.6 we obtain our main results. The first
holds for any oblivious h(n)-local search algorithm h(n)-LocalSearch, while the second holds for
the best-improvement variant h(n)-LocalBestImpSearch.
Theorem 7.7. For all h = o(n), the expected approximation ratio for randomly initialized
h(n)-LocalSearch on MaxCut is at most 3/4 + o(n). Moreover, if h(n) = cn + o(n) for any
0 < c < 1/2, the expected approximation ratio is at most 3/4 + c/2 + o(n).
Proof. Let Rmc denote the approximation ratio of h(n)-LocalSearch for MaxCut. Consider the
graph Ga,r where n = a+ 2 and 2h(n) < r < a, and let Sinit be a random cut in this graph. If
||A ∩ Sinit| − a/2| ≤ r/2− h(n)
and |B∩Sinit| = 1 then Lemma 7.5 states that Sinit must be a local optimum with f(S) = a+r.
Let XA be the event ||A ∩ Sinit| − a/2| ≤ r/2− h(n) and XB be the event |B ∩ Sinit| = 1. We
have
E[Rmc] ≤ Pr[XA] Pr[XB]a+ r
2a+ (1− Pr[XA] Pr[XB]) · 1
= 1 + Pr[XA] Pr[XB]r − a
2a
= 1 + Pr[XA]r − a
2a.
Suppose that h(n) = cn + o(n) for some constant c < 1/2 (the case h = o(n) follows by
setting c = 0). Let r = 2(c+ ε)a, where ε is some positive constant smaller than 1/2− c. Then,
for all sufficiently large a (and hence sufficiently large n) we have 2h(n) < r < a, as required.
Chapter 7. Limitations of Oblivious Local Search for CSPs 126
Furthermore, r − 2h(n) = Ω(a), and so from (7.4) we have
Pr[XA] ≥ 1− 2e−(r−2h(n))2/2a = 1− 2e−Ω(a) .
Then,
E[Rmc] ≤ 1 +(
1− 2e−Ω(a)) r − a
4a= 1 +
(1− 2e−Ω(a)
) 2c+ ε− 1
4.
Considering this bound for large a, we have
lima→∞
E[Rmc] ≤ lima→∞
1 +(
1− 2e−Ω(a)) 2c+ ε− 1
4=
2c+ ε+ 3
4=
c
2+
3
4+ε
4,
which holds for any sufficiently small ε > 0.
In the case of randomly initialized h(n)-LocalBestImpSearch, we obtain better bounds, which
in the case of h = o(n) match the worst-case locality ratio of h(n)-LocalSearch.
Theorem 7.8. For all h = o(n), the expected approximation ratio for randomly initialized
h(n)-LocalBestImpSearch on MaxCut is at most 1/2 + o(n). Moreover, if h(n) = cn + o(n) for
any 0 < c < 1/2, the expected approximation ratio is at most 1/2 + c+ o(n).
Proof. Let Rmc denote the approximation ratio of h(n)-LocalBestImpSearch for MaxCut. Again,
we consider the graph Ga,r where n = a+ 2 and 2h(n) < r < n, and let Sinit be a random cut
in this graph. From Lemma 7.6, applying at most 1 greedy improvement to any initial cut Sinit
of Ga,r satisfying the balance condition
||A ∩ Sinit| − a/2| ≤ (r/2− h(n))/3
results in a cut S′ satisfying
||A ∩ S′| − a/2| ≤ r/2− h(n) ,
as well ass |B ∩ S′| = 1. Lemma 7.5 states that this resulting cut S′ must be an h(n)-local
optimum with f(S′) ≤ a + r. Let XA be the event ||A ∩ Sinit| − a/2| ≤ (r/2 − h(n))/3. We
have:
E[Rmc] ≤ Pr[XA]a+ r
2a+ (1− Pr[XA]) · 1 = 1 + Pr[XA]
r − a2a
.
Suppose that h(n) = cn+ o(n) for some constant c < 1/2 (the case h = o(n) follows by setting
c = 0). Let r = 2(c + ε)a, where ε is some positive constant smaller than 1/2 − c. Then, for
all sufficiently large a (and hence sufficiently large n) we have 2h(n) < r < a, as required.
Furthermore, r − 2h(n) = Ω(a), and so from (7.5) we have
Pr[XA] ≥ 1− 2e−(r−2h(n))2/18a = 1− 2e−Ω(a) .
Chapter 7. Limitations of Oblivious Local Search for CSPs 127
Then,
E[Rmc] ≤ 1 +(
1− 2e−Ω(a)) r − a
2a= 1 +
(1− 2e−Ω(a)
) 2c+ ε− 1
2.
Considering this bound for large a, we obtain
lima→∞
E[Rmc] ≤ lima→∞
1 +(
1− 2e−Ω(a)) 2c+ ε− 1
2= c+
1
2+ε
2,
which holds for any sufficiently small ε > 0.
Now, we consider the case in which Sinit is chosen by a greedy-like priority algorithm.
Specifically, we analyze the algorithm that considers the vertices of a graph in order of decreasing
weighted degree, and places each vertex on the side of the cut that maximizes the total weight
of cut edges between it and previously considered vertices.
In this case, we can modify the instance Ga,r by letting half the edges between A and B
have weight 1 + ε instead of 1, as follows. Let b1 and b2 be the vertices of B. We divide the
vertices of A into 2 equal sets A1 and A2. For each a ∈ A1, we change the weight of the edge
(a, b2) to 1 + ε, and for each a ∈ A2, we change the weight of the edge (a, b1) to 1 + ε. All other
edges are as in Ga,r.
If a+r > 2+ε, the priority algorithm we have just described must first consider the vertices
b1 and b2, placing them on opposite sides of the cut. Every other vertex is then considered,
and all the vertices of A1 are placed on the same side of the cut as b1, while all the vertices
of A2 are placed on the same side of the cut as b2. The resulting cut is completely balanced
and so from Lemma 7.5 must be a local optimum of the sort we have just considered. Thus,
h(n)-LocalSearch must terminate without applying any improvement. It follows that we cannot
approximate MaxCut beyond the locality ratio of h(n)-LocalSearch by starting with a greedy
cut of this sort.
A similar argument can also be applied to the priority algorithm that considers the edges of
a graph in increasing order of weight, and at each step places any previously unplaced vertices
of an edge in a way that cuts the edge. For all r > 1 + ε, this algorithm first considers the edge
of weight r and places b1 and b2 on opposite sides of the cut. Then, it proceeds in a similar
fashion as the previous algorithm, placing exactly half of the vertices of A on one side of the
cut and half on the other side.
Chapter 8
Conclusion
We have reconsidered the general paradigm of non-oblivious local search, in which a local search
algorithm for some problem is guided by an auxiliary potential function instead of the problem’s
stated objective. We have shown that this approach can lead to improved locality ratios by
giving extra weight to solutions that will be flexible in future iterations, and guiding the local
search out of certain local optima. Non-oblivious local search gives a tight (1− 1e )-approximation
algorithm for monotone submodular maximization subject to a matroid constraint, as well as
new, improved approximation results for several problems formulated as k-exchange systems.
In contrast, we have shown that the standard oblivious local search algorithm attains only a(12 + o(n)
)-approximation for monotone submodular maximization. Results on k-set packing
by Hurkens and Schrijver [55] and Hazan, Safra, and Schwartz [53] imply that oblivious local
search is similarly limited in the case of k-exchange systems, even when large neighborhoods
are used. We have considered a variety of possible techniques for improving oblivious local
search for CSPs, including increasing the size of neighborhood N , using random or greedy
initial solutions Sinit, choosing the best possible improvement at each stage, and even allowing
pivot to have unlimited computational power. We have provided examples in which these
modifications lead to little improvement over the locality or approximation ratio of an oblivious
local search algorithm. In contrast, the results of Alimonti [2, 3, 4] and Khanna et al. [64] show
that non-oblivious local search does yield improvements in the locality ratio for many CSPs.
All of our results thus point to the relative power of the non-oblivious paradigm, even when
compared to more exotic or computationally unrestricted variants of oblivious local search. A
major direction for future research is the identification of other areas where this approach may
yield improved approximation results. One immediate question is whether it is possible to
obtain non-oblivious local search algorithms for more complicated combinatorial optimization
problems, such as coloring and scheduling problems, in which solutions are most naturally
represented as assignments rather than sets of elements. Even in the specific settings we have
considered, there are still open questions and directions for future research.
128
Chapter 8. Conclusion 129
8.1 Monotone Submodular Maximization
As discussed in Section 2.5, the continuous greedy algorithm for monotone submodular max-
imization breaks down in the case of multiple matroid constraints because certain conditions
required by the rounding phase are no longer satisfied. Our non-oblivious local search algorithm
for monotone submodular maximization matches the approximation performance of the contin-
uous greedy algorithm in the case of a single matroid constraint and does not require a rounding
phase. Thus, it could plausibly deliver improved approximations for problems where previous
rounding-based approaches encounter difficulties. A natural area to investigate first is the case
of two or more matroid constraints. Lee, Sviridenko, and Vondrak [71] have recently improved
over long-standing approximation bounds in this case by using an oblivious local search, so it is
reasonable to expect non-oblivious techniques to be applicable as well. Finally, we ask whether
it is possible to remove or reduce the sampling used to compute our non-oblivious potential
function g for general monotone submodular functions. This would significantly decrease the
runtime required by MatroidSubmodular.
We also ask whether our results can be improved in various restricted settings. It may be
possible to obtain faster, deterministic algorithms for other particular classes of submodular
functions, such as weighted sums of matroid rank functions. In the special case of coverage
functions, the techniques of Chapters 3 and 6 might be combined to give improved approxima-
tions for k-exchange systems. Another interesting problem for future research in this direction
is submodular MaxSat. Here, we are given a submodular function f on the set of clauses,
and seek an assignment so that the value of f on satisfied clauses is maximized. Because this
problem can be formulated as an instance of submodular maximization subject to a partition
matroid constraint, we obtain a (1 − 1e )-approximation using the results of Chapter 4. Azar,
Gamzu, and Roth [9] give a 23 -approximation for this problem, which is better than our general
algorithm, and give an information-theoretic hardness of approximation bound1 of 34 . Hence,
it remains possible that some non-oblivious approach tailored to this particular problem might
result in an improved approximation algorithm.
8.2 Set Systems for Local Search
Our analysis of independence systems for local search leaves several open questions as well.
First, we ask whether it is possible to obtain a natural, exact characterization for those in-
dependence systems for which (1, k)-OblLocalSearch has a locality gap of 1k . It is possible to
obtain one such characterization by using LP duality and techniques similar to those used in
our derivation of non-oblivious potentials for maximum coverage and monotone submodular
maximization in matroids. Unfortunately, the resulting characterization is somewhat unnat-
ural, involving properties that are not appreciably simpler than a direct proof of the locality
1Azar, Gamzu, and Roth note that this result can also be obtained by a result of Mirrokni, Schapira, andVondrak for combinatorial auctions with submodular bidders.
Chapter 8. Conclusion 130
ratio for (1, k)-OblLocalSearch.
Another area for future research is to improve the runtime of the algorithms given Chapter 6.
Our algorithms have exponential dependence on k. For the weighted independent set problem
in (k+ 1)-claw free graphs, Berman and Krysta [12] give a non-oblivious local search algorithm
whose runtime is independent of k, at the expense of a lowered approximation ratio. It should
be possible to extend their results to the general class of strong k-exchange systems.
Additionally, we ask whether the analysis of the locality ratio of our non-oblivious algorithms
for linear and monotone submodular maximization in k-exchange systems can be improved
to match the upper bounds of 2k+ε and 2
k+2+ε given by Hurkens and Schrijver [55] for the
unweighted case. As we discussed in Section 6.2, this is the best upper bound that we have for
general k, but our analysis only gives a lower bound of 2k+1+ε and 2
k+3+ε . Removing the extra
additive term of 1 would give improved approximations for several specific problems.
Finally, we ask whether the non-oblivious approach used to develop Submodular-k-Exchange
in Section 6.2 can be generalized to non-monotone submodular maximization. Feldman, Naor,
and Schwartz [39] were able to obtain improved bounds for oblivious local search in this case by
generalizing techniques applied to k-matroid intersection by Lee, Sviridenko, and Vondrak [71].
Surprisingly, almost all of the analysis from Section 6.2 can be adapted to the non-monotone
case. The proof breaks down only in the last step, where we use monotonicity to infer that
f(S ∪O) ≥ f(O). In the case of oblivious local search, Lee, Sviridenko, and Vondrak overcome
this difficulty by performing a series of iterated local searches and taking the best solution.
Unfortunately, the use of specially ordered marginals to approximate weights, which is critical
in our analysis, interferes with the analysis of the iterated approach.
8.3 Negative Results for CSPs
The negative results from Chapter 7.2 raise several questions. Specifically, we ask whether the
condition of (Y, Z)-robustness can be removed or simplified, and whether these techniques can
be generalized to larger classes of CSPs or even other problems in general. Although we have
shown many local search algorithms for MaxCut have approximation ratio no better than their
locality ratio, the relationship between these two measures is still poorly understood. Extending
our analysis of MaxCut to other randomized variants of local search, such as simulated annealing,
is another problem for future research. Additionally, we ask whether it is possible to obtain
similar negative results for certain classes of non-oblivious potential functions, as well.
Finally, we consider the gap between the expected approximation ratio of 12 , which holds for
best improvement local search, and 34 which holds without any assumptions on the algorithm’s
pivot. We ask whether this gap is due to our analysis or whether there is in fact some pivot rule
that can give a local search algorithm with approximation ratio better than 12 . For some time,
the algorithm of Goemans and Williamson [45] was the only algorithm to obtain better than a12 -approximation in arbitrary graphs. Recently, however, algorithms using spectral techniques
Chapter 8. Conclusion 131
[91, 86, 61] have been able to deliver approximations beyond 12 for general graphs. These
algorithms utilize information about the eigenvalues of a graph to guide an iterative greedy-like
algorithm. We ask whether similar techniques can be incorporated into a non-oblivious local
search algorithm as well.
Bibliography
[1] Alexander Ageev and Maxim Sviridenko. Pipage Rounding: A New Method of Construct-
ing Algorithms with Proven Performance Guarantee. Journal of Combinatorial Optimiza-
tion, 8(3):307–328, 2004.
[2] Paola Alimonti. New local search approximation techniques for maximum generalized sat-
isfiability problems. In CIAC ’94: Proceedings of the 2nd Italian Conference on Algorithms
and Complexity, pages 40–53. Springer-Verlag, 1994.
[3] Paola Alimonti. Non-Oblivious Local Search for Graph and Hyperpraph Coloring Prob-
lems. In WG ’95: Proceedings of the 21st International Workshop on Graph-Theoretic
Concepts in Computer Science, pages 167–180. Springer-Verlag, 1995.
[4] Paola Alimonti. Non-oblivious Local Search for MAX 2-CCSP with Application to MAX
DICUT. In WG ’97: Proceedings of the 23rd International Workshop on Graph-Theoretic
Concepts in Computer Science, pages 2–14. Springer-Verlag, 1997.
[5] Esther M. Arkin and Refael Hassin. On Local Search for Weighted k-Set Packing. In ESA
’97: Proceedings of the 5th European Symposium on Algorithms, pages 13–22. Springer-
Verlag, 1997.
[6] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and
Vinayaka Pandit. Local Search Heuristics for k-Median and Facility Location Problems.
SIAM Journal on Computing, 33(3):544–562, 2004.
[7] G. Ausiello, P. Crescenzi, and M. Protasi. Approximate solution of NP optimization
problems. Theoretical Computer Science, 150(1):1–55, 1995.
[8] Giorgio Ausiello and Marco Protasi. Local search, reducibility and approximability of
NP-optimization problems. Information Processing Letters, 54(2):73–79, 1995.
[9] Yossi Azar, Iftah Gamzu, and Ran Roth. Submodular Max-SAT. In ESA ’11: Proceedings
of the 19th European Symposium on Algorithms, ESA’11, pages 323–334. Springer-Verlag,
2011.
132
Bibliography 133
[10] George A. Baker and Peter Graves-Morris. Pade Approximants. Number 59 in Ency-
clopedia of Mathematics and its Applications. Cambridge University Press, 2nd edition,
1996.
[11] Piotr Berman. A d/2 approximation for maximum weight independent set in d-claw free
graphs. Nordic Journal of Computing, 7:178–184, 2000.
[12] Piotr Berman and Piotr Krysta. Optimizing misdirection. In SODA ’03: Proceedings of the
14th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 192–201, Philadelphia,
PA, USA, 2003. SIAM.
[13] Allan Borodin, Joan Boyar, Kim S. Larsen, and Nazanin Mirmohammadi. Priority algo-
rithms for graph optimization problems. Theoretical Computer Science, 411(1):239–258,
2010.
[14] Allan Borodin, Morten N Nielsen, and Charles Rackoff. (Incremental) Priority Algorithms.
Algorithmica, 37(4):295–326, September 2003.
[15] Dietrich Braess. Nonlinear Approximation Theory, volume 7 of Springer Series in Com-
putational Mathematics. Springer-Verlag, Berlin, 1986.
[16] Richard A. Brualdi. Comments on Bases in Dependence Structures. Bulletin of the Aus-
tralian Mathematical Society, 1(02):161–167, 1969.
[17] Richard A. Brualdi. Common transversals and strong exchange systems. Journal of Com-
binatorial Theory, 8(3):307–329, 1970.
[18] Richard A. Brualdi. Induced matroids. Proceedings of the American Mathematical Society,
29:213–221, 1971.
[19] Richard A. Brualdi and Edward B. Scrimger. Exchange systems, matchings, and transver-
sals. Journal of Combinatorial Theory, 5(3):244–257, 1968.
[20] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrak. Maximizing a Submod-
ular Set Function Subject to a Matroid Constraint (Extended Abstract). In IPCO ’07:
Proceedings of the 12th International Conference on Integer Programming and Combina-
torial Optimization, pages 182–196. Springer-Verlag, 2007.
[21] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrak. Maximizing a Sub-
modular Set Function Subject to a Matroid Constraint. SIAM Journal on Computing,
40(6):1740–1766, 2011.
[22] Barun Chandra and Magnus Halldorsson. Greedy local improvement and weighted set
packing approximation. In SODA ’99: Proceedings of the 10th Annual ACM-SIAM Sym-
posium on Discrete Algorithms, pages 169–176. SIAM, 1999.
Bibliography 134
[23] Moses Charikar and Sudipto Guha. Improved Combinatorial Algorithms for Facility Lo-
cation Problems. SIAM Journal on Computing, 34(4):803–824, 2005.
[24] Chandra Chekuri and Amit Kumar. Maximum coverage problem with group budget con-
straints and applications. In APPROX ’04: Proceedings of the 7th International Work-
shop on Approximation Algorithms for Combinatorial Optimization Problems, pages 72–83,
2004.
[25] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Dependent randomized rounding
via exchange properties of combinatorial structures. In FOCS ’10: Proceedings of the 51st
IEEE Symposium on Foundations of Computer Science, pages 575–584. IEEE Computer
Society, 2010.
[26] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Multi-budgeted matchings and ma-
troid intersection via dependent rounding. In SODA ’11: Proceedings of the 22nd Annual
ACM-SIAM Symposium on Discrete Algorithms, pages 1080–1097. SIAM, 2011.
[27] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Submodular function maximization
via the multilinear relaxation and contention resolution schemes. In STOC ’11: Proceedings
of the 43rd ACM Symposium on Theory of Computing, pages 783–792. ACM, 2011.
[28] Reuven Cohen and Liran Katzir. The generalized maximum coverage problem. Information
Processing Letters, 108(1):15–22, 2008.
[29] Michele Conforti and Gerard Cornuejols. Submodular set functions, matroids and the
greedy algorithm: Tight worst-case bounds and some generalizations of the rado-edmonds
theorem. Discrete Applied Mathematics, 7(3):251–274, 1984.
[30] Gerard Cornuejols, Marshall L. Fisher, and George L. Nemhauser. Location of Bank
Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms.
Management Science, 23(8):789–810, 1977.
[31] William H Cunningham. Testing membership in matroid polyhedra. Journal of Combina-
torial Theory, Series B, 36(2):161–188, 1984.
[32] Sashka Davis and Russell Impagliazzo. Models of Greedy Algorithms for Graph Problems.
Algorithmica, 54(3):269–317, 2009.
[33] Jack Edmonds. Matroids and the greedy algorithm. Mathematical Programming, 1(1):127–
136, 1971.
[34] Jack Edmonds. Matroid intersection. In P.L. Hammer, E.L. Johnson, and B.H. Korte, ed-
itors, Discrete Optimization I: Proceedings of the Advanced Research Institute on Discrete
Optimization and Systems Applications of the Systems Science Panel of NATO and of the
Bibliography 135
Discrete Optimization Symposium, volume 4 of Annals of Discrete Mathematics, pages 39
– 49. Elsevier, 1979.
[35] Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45:634–
652, 1998.
[36] Moran Feldman, Joseph (Seffi) Naor, and Roy Schwartz. Nonmonotone submodular max-
imization via a structural continuous greedy algorithm. In ICALP ’11: Proceedings of
the 38th International Colloquium Conference on Automata, Languages and Programming,
pages 342–353. Springer-Verlag, 2011.
[37] Moran Feldman, Joseph (Seffi) Naor, and Roy Schwartz. A unified continuous greedy
algorithm for submodular maximization. In FOCS ’11: Proceedings of the 52nd IEEE
Symposium on Foundations of Computer Science, pages 570–579. IEEE Computer Society,
2011.
[38] Moran Feldman, Joseph (Seffi) Naor, and Roy Schwartz. A tight linear time (1/2)-
approximation for unconstrained submodular maximization. In FOCS ’12: Proceedings
of the 53rd IEEE Symposium on Foundations of Computer Science, 2012. Forthcoming.
[39] Moran Feldman, Joseph (Seffi) Naor, Roy Schwartz, and Justin Ward. Improved approxi-
mations for k-exchange systems. In ESA ’11: Proceedings of the 19th European Symposium
on Algorithms, pages 784–798. Springer-Verlag, 2011.
[40] Yuval Filmus and Justin Ward. The Power of Local Search: Maximum Coverage over a
Matroid. In STACS ’12: 29th International Symposium on Theoretical Aspects of Computer
Science, pages 601–612. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2012.
[41] Yuval Filmus and Justin Ward. A tight combinatorial algorithm for submodular maxi-
mization subject to a matroid constraint. CoRR, abs/1204.4526, 2012.
[42] Yuval Filmus and Justin Ward. A tight combinatorial algorithm for submodular maxi-
mization subject to a matroid constraint. In FOCS ’12: Proceedings of the 53rd IEEE
Symposium on Foundations of Computer Science. IEEE Computer Society, 2012. Forth-
coming.
[43] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximations for
maximizing submodular set functions—II. Mathematical Programming Studies, 8:73–87,
1978.
[44] Shayan Oveis Gharan and Jan Vondrak. Submodular maximization by simulated anneal-
ing. In SODA ’11: Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 1098–1117. SIAM, 2011.
Bibliography 136
[45] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for
maximum cut and satisfiability problems using semidefinite programming. Journal of the
ACM, 42(6):1115–1145, 1995.
[46] Pranava R. Goundan and Andreas S. Schulz. Revisiting the Greedy Approach to Submod-
ular Set Function Maximization. Preprint, 2007.
[47] Fabrizio Grandoni and Rico Zenklusen. Approximation schemes for multi-budgeted inde-
pendence systems. In ESA ’10: Proceedings of the 18th European Conference on Algo-
rithms: Part I, pages 536–548. Springer-Verlag, 2010.
[48] Curtis Greene and Thomas L. Magnanti. Some Abstract Pivot Algorithms. SIAM Journal
on Applied Mathematics, 29(3):530–539, 1975.
[49] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-
monotone submodular maximization: Offline and secretary algorithms. In WINE 2010:
Internet and Network Economics, pages 246–257. Springer, 2010.
[50] Magnus Halldorsson. Approximating discrete collections via local improvements. In SODA
’95: Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, pages
160–169. SIAM, 1995.
[51] D. Hausmann and B. Korte. k-greedy algorithms for independence systems. Mathematical
Methods of Operations Research, 22:219–228, 1978. 10.1007/BF01917662.
[52] D. Hausmann, B. Korte, and T. A. Jenkyns. Worst case analysis of greedy type algorithms
for independence systems. Mathematical Programming Studies, 12:120–131, 1980.
[53] Elad Hazan, Shmuel Safra, and Oded Schwartz. On the complexity of approximating k-set
packing. Computational Complexity, 15:20–39, 2006.
[54] Dorit S. Hochbaum and Anu Pathria. Analysis of the greedy approach in covering problems.
Naval Research Quarterly, 45:615–627, 1998.
[55] C. A. J. Hurkens and A. Schrijver. On the size of systems of sets every t of which have an
SDR, with an application to the worst-case ratio of heuristics for packing problems. SIAM
Journal of Discrete Mathematics, 2(1):68–72, 1989.
[56] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. A combinatorial strongly polynomial
algorithm for minimizing submodular functions. Journal of the ACM, 48(4):761–777, July
2001.
[57] Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V. Vazi-
rani. Greedy facility location algorithms analyzed using,dual fitting with factor-revealing
LP. Journal of the ACM, 50(6):795–824, 2003.
Bibliography 137
[58] T. A. Jenkyns. The efficacy of the ”greedy” algorithm. In Proceedings of the 7th Southeast-
ern Conference on Combinatorics, Graph Theory, and Computing, pages 341–350. Utilitas
Mathematica, 1976.
[59] Per M. Jensen and Bernhard Korte. Complexity of Matroid Property Algorithms. SIAM
Journal on Computing, 11(1):184, 1982.
[60] David S. Johnson, Christos H. Papadimitriou, and Mihalis Yannakakis. How easy is local
search? Journal of Computer and Systems Sciences, 37(1):79–100, 1988.
[61] Satyen Kale and C. Seshadhri. Combinatorial approximation algorithms for maxcut using
random walks. In Proceedings of the 2nd Symposium on Innovations in Computer Science
(ICS 2011), pages 367–388. Tsinghua University Press, 2011.
[62] Haim Kaplan, Moshe Lewenstein, Nira Shafrir, and Maxim Sviridenko. Approximation
algorithms for asymmetric tsp by decomposing directed regular multigraphs. Journal of
the ACM, 52(4):602–626, July 2005.
[63] S. Khanna, R. Motwani, M. Sudan, and U. Vazirani. On Syntactic Versus Computational
Views of Approximability. In SFCS ’94: Proceedings of the 35th Symposium on Founda-
tions of Computer Science, pages 819–830. IEEE Computer Society, 1994.
[64] S. Khanna, R. Motwani, M. Sudan, and U. Vazirani. On syntactic versus computational
views of approximability. SIAM Journal on Computing, 28(1):164–191, 1998.
[65] Samir Khuller, Anna Moss, and Joseph (Seffi) Naor. The budgeted maximum coverage
problem. Information Processing Letters, 70:39–45, April 1999.
[66] Bernhard Korte and Dirk Hausmann. An Analysis of the Greedy Heuristic for Indepen-
dence Systems. In P. Hell B Alspach and D.J. Miller, editors, Algorithmic Aspects of
Combinatorics, pages 65–74. Elsevier, 1978.
[67] Lukasz Kowalik and Marcin Mucha. Deterministic 7/8-approximation for the metric max-
imum tsp. Theoretical Computer Science, 410(47-49):5000–5009, November 2009.
[68] Eugene L. Lawler. Matroid intersection algorithms. Mathematical Programming, 9(1):31–
56, 1975.
[69] Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone
submodular maximization under matroid and knapsack constraints. In STOC ’09: Pro-
ceedings of the 41st ACM Symposium on Theory of Computing, pages 323–332. ACM,
2009.
[70] Jon Lee, Maxim Sviridenko, and Jan Vondrak. Matroid matching: the power of local
search. In STOC ’10: Proceedings of the 42nd ACM Symposium on Theory of Computing,
pages 369–378. ACM, 2010.
Bibliography 138
[71] Jon Lee, Maxim Sviridenko, and Jan Vondrak. Submodular Maximization over Multi-
ple Matroids via Generalized Exchange Properties. Mathematics of Operations Research,
35(4):795–806, 2010.
[72] L Lovasz. Matroid matching and some applications. Journal of Combinatorial Theory,
Series B, 28(2):208–236, 4 1980.
[73] L. Lovasz. The matroid matching problem. In Laszlo. Lovasz and Vera T. Sos, editors,
Algebraic Methods in Graph Theory (Colloquium Szeged 1978), volume II, pages 495–517,
Amsterdam, 1981.
[74] Julian Mestre. Greedy in approximation algorithms. In ESA ’06: Proceedings of the 14th
European Symposium on Algorithms, pages 528–539. Springer-Verlag, 2006.
[75] G. L. Nemhauser and L. A. Wolsey. Best Algorithms for Approximating the Maximum of
a Submodular Set Function. Mathematics of Operations Research, 3(3):177–188, 1978.
[76] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for max-
imizing submodular set functions—I. Mathematical Programming, 14(1):265–294, 1978.
[77] James B. Orlin, Abraham P. Punnen, and Andreas S. Schulz. Approximate local search
in combinatorial optimization. In SODA ’04: Proceedings of the 15th Annual ACM-SIAM
Symposium on Discrete Algorithms, pages 587–596. SIAM, 2004.
[78] Henri Pade. Sur la representation approchee d’une fonction par des fractions rationnelles.
Annales Scientifiques de l’Ecole Normale Superieure, Ser. 3, 9:3–93 (supplement), 1892.
[79] C. H. Papadimitriou, A A Schaffer, and M. Yannakakis. On the complexity of local search.
In STOC ’90: Proceedings of the 22nd ACM Symposium on Theory of Computing. ACM,
1990.
[80] Christos H. Papadimitriou and Mihalis Yannakakis. The traveling salesman problem with
distances one and two. Mathematics of Operations Research, 18(1):1–11, February 1993.
[81] R. Rado. Note on Independence Functions. Proceedings of the London Mathematical
Society, s3-7(1):300–320, 1957.
[82] Sartaj Sahni. Approximate algorithms for the 0/1 knapsack problem. Journal of the ACM,
22(1):115–124, January 1975.
[83] Alejandro Schaffer and Mihalis Yannakakis. Simple Local Search Problems That Are Hard
to Solve. SIAM Journal on Computing, 20(1):56–87, 1991.
[84] Alexander Schrijver. A combinatorial algorithm minimizing submodular functions in
strongly polynomial time. Journal of Combinatorial Theory, Series B, 80(2):346–355,
2000.
Bibliography 139
[85] Alexander Schrijver. Combinatorial Optimization: Polyhedra and Efficiency. Springer,
2003.
[86] Jose Soto. Improved Analysis of a Max Cut Algorithm Based on Spectral Partitioning.
arXiv.org, cs.DS, October 2009.
[87] Jose A. Soto. A simple PTAS for Weighted Matroid Matching on Strongly Base Orderable
Matroids. Electronic Notes in Discrete Mathematics, 37(0):75–80, 2011.
[88] Aravind Srinivasan. Distributions on level-sets with applications to approximation al-
gorithms. In FOCS ’01: Proceedings of the 42nd IEEE Symposium on Foundations of
Computer Science, pages 588–599. IEEE Computer Society, 2001.
[89] Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsack
constraint. Operations Research Letters, 32(1):41–43, January 2004.
[90] Po Tong, Eugene L. Lawler, and V. V. Vazirani. Solving the Weighted Parity Problem
for Gammoids by Reduction to Graphic Matching. Technical Report UCB/CSD-82-103,
University of California, Berkeley, April 1982.
[91] Luca Trevisan. Max cut and the smallest eigenvalue. In STOC ’09: Proceedings of the
41st ACM Symposium on Theory of Computing, pages 263–272. ACM, 2009.
[92] C. Underhill and A. Wragg. Convergence properties of pade approximants to exp(z) and
their derivatives. IMA Journal of Applied Mathematics, 11(3):361–367, 1973.
[93] Jan Vondrak. Optimal approximation for the submodular welfare problem in the value
oracle model. In STOC ’08: Proceedings of the 40th ACM Symposium on Theory of
Computing, pages 67–74. ACM, 2008.
[94] Jan Vondrak. Symmetry and Approximability of Submodular Maximization Problems. In
FOCS ’09: Proceedings of the 50th IEEE Symposium on Foundations of Computer Science,
pages 651–670. IEEE Computer Society, 2009.
[95] Jan Vondrak. Submodularity and curvature: the optimal algorithm. In RIMS Kokyuroku
Bessatsu, volume B23, pages 253–266, Kyoto, 2010.
[96] Jens Vygen. A note on schrijver’s submodular function minimization algorithm. Journal
of Combinatorial Theory, Series B, 88(2):399–402, 2003.
[97] Justin Ward. A (k+ 3)/2-approximation algorithm for monotone submodular k-set pack-
ing and general k-exchange systems. In STACS ’12: 29th International Symposium on
Theoretical Aspects of Computer Science, pages 42–53. Schloss Dagstuhl–Leibniz-Zentrum
fuer Informatik, 2012.
Bibliography 140
[98] Hassler Whitney. On the Abstract Properties of Linear Dependence. American Journal of
Mathematics, 57(3):509–533, 1935.
List of Algorithms
1 GenLocalSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 SubmodularGreedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 PartEnum(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 k-LocalCoverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6 MatroidCoverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 MatroidSubmodular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8 CurvatureEnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9 (a, r)-OblLocalSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10 Linear-k-Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11 Submodular-k-Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12 h(n)-LocalSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
141