oblivious and non-oblivious local search for combinatorial ... · de nition 1.1 (combinatorial...

Oblivious and Non-Oblivious Local Search for CombinatorialOptimization

by

Justin Ward

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Computer ScienceUniversity of Toronto

c© Copyright 2012 by Justin Ward

Abstract

Oblivious and Non-Oblivious Local Search for Combinatorial Optimization

Justin Ward

Doctor of Philosophy

Graduate Department of Computer Science

University of Toronto

2012

Standard local search algorithms for combinatorial optimization problems repeatedly apply

small changes to a current solution to improve the problem’s given objective function. In

contrast, non-oblivious local search algorithms are guided by an auxiliary potential function,

which is distinct from the problem’s objective. In this thesis, we compare the standard and

non-oblivious approaches for a variety of problems, and derive new, improved non-oblivious

local search algorithms for several problems in the area of constrained linear and monotone

submodular maximization.

First, we give a new, randomized approximation algorithm for maximizing a monotone sub-

modular function subject to a matroid constraint. Our algorithm’s approximation ratio matches

both the known hardness of approximation bounds for the problem and the performance of the

recent “continuous greedy” algorithm. Unlike the continuous greedy algorithm, our algorithm

is straightforward and combinatorial. In the case that the monotone submodular function is a

coverage function, we can obtain a further simplified, deterministic algorithm with improved

running time.

Moving beyond the case of single matroid constraints, we then consider general classes of set

systems that capture problems that can be approximated well. While previous such classes have

focused primarily on greedy algorithms, we give a new class that captures problems amenable

to optimization by local search algorithms. We show that several combinatorial optimization

problems can be placed in this class, and give a non-oblivious local search algorithm that

delivers improved approximations for a variety of specific problems. In contrast, we show that

standard local search algorithms give no improvement over known approximation results for

these problems, even when allowed to search larger neighborhoods than their non-oblivious

counterparts.

ii

Finally, we expand on these results by considering standard local search algorithms for con-

straint satisfaction problems. We develop conditions under which the approximation ratio of

standard local search remains limited even for super-polynomial or exponential local neighbor-

hoods. In the special case of MaxCut, we further show that a variety of techniques including

random or greedy initialization, large neighborhoods, and best-improvement pivot rules cannot

improve the approximation performance of standard local search.

iii

Acknowledgements

I thank the members of both my supervisory and final oral exam committees—Charles Rack-

off, Toni Pitassi, Avner Magen, Alasdair Urquhart, and Derek Corneil—as well as my external

examiner Anupam Gupta. They provided many useful suggestions, insightful observations, and

supportive comments that have helped shape this thesis. Additionally, I would like to Maxim

Sviridenko for many useful discussions and advice on simplifying some the proofs presented

in the thesis, and Julian Mestre for comments on an initial draft of some results presented in

Chapters 5 and 6.

I thank various colleagues and friends at the University—especially, Siavosh Benabbas,

George Dahl, Golnaz Elahi, Michalis Famelis, Yuval Filmus, Wesley George, Abe Heifets, Dai

Tri Man Le, Joel Oren, Jocelyn Simmonds, Colin Stewart, Rory Tulk, and Dustin Wehr—for

providing not only intellectual insights and technical help but also levity, camaraderie, and

empathy. In addition, I thank Yuval Filmus for his many and considerable contributions to the

proofs presented in Chapters 3 and 4. Finally, I thank Lila Fontes for aiding in the translation

of relevant portions of Pade’s thesis, and for being a close friend and confidant throughout my

studies.

I thank the Department of Computer Science and the School of Graduate Studies for pro-

viding financial support for my studies at the University of Toronto.

I thank my supervisor Allan Borodin for his sound advice and unwavering support. My

research initially involved a significant change of direction for me and so was accompanied

by occasionally daunting periods of self-doubt and uncertainty. Most of all I thank him for

maintaining faith in my abilities and prospects, even when I had none. His support—whether

intellectual, emotional, moral, or merely financial—gave me the confidence to complete this

work, and I hope above all else that he finds the end result to be worthy of his considerable

investment.

Finally, I thank my best friend and wife Amy Miller who has been and remains a steadfast

advocate, constant inspiration, and insoluble mystery to me.

iv

Contents

1 Introduction 1

1.1 A Generic Local Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Theoretical Results for General Local Search . . . . . . . . . . . . . . . . . . . . 4

1.3 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Preliminaries 10

2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.3 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Linear and Submodular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Independence Systems and Matroids . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 The Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 The Continuous Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6 Partial Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Maximum Coverage 25

3.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 A Non-Oblivious Local Search Algorithm . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Analysis of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Obtaining the α Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Monotone Submodular Maximization 43

4.1 A Non-Oblivious Local Search Algorithm . . . . . . . . . . . . . . . . . . . . . . 44

4.2 Analysis of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2.1 Properties of the Sequences γ . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.2 Locality Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2.3 Computing g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.4 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.3 Obtaining the Coefficient Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4 Further Properties of g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

v

5 Set Systems for Local Search 78

5.1 Set Systems for Greedy Approximation Algorithms . . . . . . . . . . . . . . . . . 78

5.2 Weak k-Exchange Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.3 Strong k-Exchange Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.4.1 Independent Set in (k + 1)-Claw Free Graphs . . . . . . . . . . . . . . . . 86

5.4.2 k-Matroid Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.4.3 k-Uniform Hypergraph b-Matching . . . . . . . . . . . . . . . . . . . . . . 89

5.4.4 Matroid k-Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.5 Maximum Asymmetric Traveling Salesman . . . . . . . . . . . . . . . . . 92

6 Algorithms for Strong k-Exchange Systems 94

6.1 Linear Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.2 Monotone Submodular Maximization . . . . . . . . . . . . . . . . . . . . . . . . . 101

7 Limitations of Oblivious Local Search for CSPs 111

7.1 Large Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.2 Random and Greedy Initial Solutions and Best Improvement Pivot Rules . . . . 119

8 Conclusion 128

8.1 Monotone Submodular Maximization . . . . . . . . . . . . . . . . . . . . . . . . . 129

8.2 Set Systems for Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

8.3 Negative Results for CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Bibliography 132

List of Algorithms 141

vi

Chapter 1

Introduction

Local search is one of the simplest algorithmic approaches to combinatorial optimization prob-

lems. Despite its simplicity, local search has been successful in both practical and theoretical

settings. It is widely used as an heuristic for solving NP-hard problems, and appears as a key

component of such classic algorithms as Edmonds’ matching algorithms, the Ford-Fulkerson

algorithm, and Dantzig’s simplex algorithm, as well as many state-of-the-art approximation

algorithms. In the non-oblivious variant of local search, the algorithm is guided by an auxiliary

potential function instead of the problem’s given objective. This technique was first formalized

by Alimonti [2, 3, 4] and Khanna, Motwani, Sudan, and Vazirani [63, 64] but has seen limited

application since its introduction. Here, we reconsider non-oblivious local search and give sev-

eral new applications of the technique. We show that standard, oblivious local search algorithms

have limited approximation performance for these applications, and thereby demonstrate the

relative power of non-oblivious algorithms.

1.1 A Generic Local Search Algorithm

We now describe more formally what we mean by “local search.” First, we describe the general

class of optimization problems considered in this thesis. We restrict our attention to combina-

torial optimization problems of the following form:1

Definition 1.1 (Combinatorial Optimization Problem). A combinatorial optimization problem

consists of:

• a goal in max,min that specifies whether it is a maximization or minimization problem.

• a ground set X.

• a collection F of subsets of X called feasible solutions.

1See Ausiello, Crescenzi, and Protasi [7] for a survey on the theory of NP-Optimization problems. Note thatour definition varies slightly from the standard in that we do not require f to assign integer values to solutions.Our notion of a “combinatorial optimization problem” is not intended to capture all problems in the field ofcombinatorial optimization, but is general enough to capture all those problems that we consider.

1

Chapter 1. Introduction 2

• a function f : 2X → R≥0 assigning a value to each subset of the ground set.

The goal of the problem is to find a set S ∈ F that either maximizes or minimizes the function

f (depending on the stated goal).

All of the problems that we consider will be maximization problems, so we shall not specify

the goal explicitly. Note that, in general, there may not be a succinct representation for either

F or f . Generally, we shall assume that F is given as a membership oracle (also called an

independence oracle, for reasons that will be made clear in Section 2.3) that answers whether

a given set is in F or not. Similarly, we generally suppose that f is given as a value oracle

that, given a subset S of X, returns its value f(S). Notable exceptions are linear functions

(described in Section 2.2) and coverage functions (described in Section 3.1). In these cases, f

has a succinct representation that we shall exploit in our algorithms.

Because the combinatorial optimization problems we consider are NP-hard, we focus on the

problem of obtaining approximate solutions. Given an instance of a combinatorial optimization

problem, we say that an algorithm is an r-approximation algorithm for that instance, for r ∈[0, 1], if the solution S produced by the algorithm is at least r times the value of the optimal

solution O. We call the value r the approximation ratio for the algorithm on the given instance,

and define the approximation ratio of an algorithm for a problem to be the infimum of the

approximation ratios of its instances. Again, here we consider only maximization problems (a

similar definition can be obtained for minimization by reversing the role of S and O in the

definition). Note that since we use approximation ratios in the range [0, 1], larger values reflect

better approximations.

The primary concern of this thesis is the application of local search to particular combi-

natorial optimization problems. Our general notion of a local search algorithm is captured in

Algorithm 1. The generic local search algorithm GenLocalSearch is parameterized by several

component functions, which together define a particular local search algorithm for a combina-

torial optimization problem.

Definition 1.2 (Generic Local Search Algorithm). Let I be some instance of a combinatorial

optimization problem with solution space S, feasible solutions, F and objective value f . The

generic local search algorithm for I has the form shown in Algorithm 1, and is specified by the

following component functions:

• A potential function g assigning each solution S ∈ S a value g(S) ∈ R≥0.

• A neighborhood structure N associating a set of nearby solutions N(S) ⊆ S with each

solution S ∈ S.

• A pivot rule pivot selecting a solution pivot(C) from the set of improved, feasible2

solutions C = T ∈ N(S) : T ∈ F and g(T ) > g(S) whenever this set is non-empty.

2While there are local search algorithms that consider infeasible solutions during the search process, all of thealgorithms considered in this thesis only consider feasible solutions.


• An initial solution Sinit ∈ F .

Algorithm 1: GenLocalSearch

Algorithm: LocalSearch

S ← Sinit;repeatC ← ∅;foreach T ∈ N(S) do

if T ∈ F and g(T ) > g(S) thenC ← C ∪ T;

if C 6= ∅ thenS ← pivot(C);

until S does not change;return S;

Note that each of the functions in Definition 1.2 depend implicitly on the instance I. Intu-

itively, the local search algorithm proceeds by first finding an initial feasible solution Sinit, then

repeatedly searching in the neighborhood N(S) of the current solution S for a set of feasible

candidate solutions, each of which improves the potential function g. After a set C of candidate

solutions is found, the pivot rule pivot selects a new current solution from it. When no im-

proved solutions are found in the neighborhood of the current solution, the algorithm returns

S.

In most of the algorithms we present, the pivot rule simply returns the first improved feasible

solution encountered when search N(S). In this case, it is unnecessary to build the entire set of

candidate solutions C, and so we omit this step from the algorithm. One notable exception is

in Chapter 7 where we examine the effect of the pivot rule on the approximation performance

of Algorithm 1.

In general, there are no global guarantees on the quality of the solution S returned by

GenLocalSearch even in terms of the potential function g. However, we can say that any S

returned by GenLocalSearch is a local optimum of g in the following sense:

Definition 1.3 (Local Optimum). Let N be a neighborhood structure and g be a potential

function. Then, a solution S ∈ F is a local optimum with respect to g and N if we have

g(T ) ≤ g(S)

for all T ∈ N(S).

Note that the notion of local optimality depends on both the neighborhood N and the

potential function g used in GenLocalSearch. Thus, whenever we refer to “local optima” we mean

local optima with respect to some understood, previously fixed neighborhood and potential

function.


1.2 Theoretical Results for General Local Search

We now present an overview of the major theoretical approaches to local search as a general

algorithmic paradigm. The first such approach concerns the time required for Algorithm 1 to

converge to a local optimum. Because its runtime depends on the number of improvements

that it applies, Algorithm 1 could require exponential time to find a local optimum even if

each improvement can be calculated efficiently. Motivated by this general question, Johnson,

Papadimitriou, and Yannakakis [60] define the class PLS, of polynomial local search problems. A

search problem is in PLS if the initial solution and each iteration of the local search algorithm can

be carried out in polynomial time. Hence, the primary complexity-theoretic questions regarding

the class PLS pertain to the number of improvements that the local search algorithm can make.

All of the problems in this class are search problems, in which the goal is to find a solution to

an NP-optimization problem that is locally optimal with respect to some given neighborhood

and objective function. Thus, when we speak about the PLS completeness of some problem, we

are referring specifically to the problem of finding any locally optimal solution with respect to

some stated neighborhood.

Johnson et al. provide an appropriate reduction for the class PLS, which they use to prove

the completeness of a variety of problems. They prove that if any PLS problem is NP-hard

then NP =coNP. In contrast, they show that the “standard algorithm problem,” in which

we must find the specific local optimum returned by GenLocalSearch for some particular initial

starting solution Sinit, is NP-hard for all PLS-complete problems. Using “tight” PLS-reductions,

Papadimitriou, Yannakakis, and Shaffer [79] give a general method for showing that a variety

of local search problems do in fact have exponential worst-case behavior. Moreover, they show

that the standard algorithm problem is in fact PSPACE-complete. Thus, the problem of finding

some local optimum appears to be easier than that of finding the particular local optimum

produced by a local search algorithm. In this and subsequent work [83], they demonstrate the

PLS-completeness of several well-known local search problems, including Lin and Kernighan’s

heuristics for the traveling salesman and graph bipartition problems, the problem of finding a

stable configuration in an undirected neural network, and various local search algorithms for

MaxCut.

In practice, we can eschew the difficulties posed by PLS-completeness by weakening our

notion of local optimality. In many situations, it is sufficient to find a solution that is only

approximately locally optimal, in the following sense.

Definition 1.4 (ε-Approximate Local Optimum). Let N be a neighborhood structure and g be

a potential function. Then, a solution S ∈ F is an ε-approximate local optimum with respect

to g and N if we have

g(T ) ≤ (1 + ε)g(S)

for all T ∈ N(S).

This idea has been used in a variety of contexts to yield polynomial-time local search


algorithms. A variant for linear objective functions is described by Arkin and Hassin [5]. They

round all the weights used to define the objective function down to integer multiples of some

well-chosen value, thus requiring that each step of the local search algorithm must make a

constant additive improvement to the problem’s objective function. Orlin, Punnen, and Schulz

[77] consider the general difficulty of finding approximate local optima for linear combinatorial

optimization problems, and show that this can be accomplished in polynomial time for any

problem in PLS.

While the general theory of PLS-completeness gives non-trivial bounds on the convergence

time of the standard local search algorithm, it says nothing about the relative quality of the

local optima produced by local search. In order to study such questions, Ausiello and Protasi [8]

introduce the class GLO of those NP-Optimization problems that have guaranteed local optima

with respect to a given neighborhood mapping. A problem has guaranteed local optima with

respect to N if there is some constant k ∈ R≥0 such that any solution S that is locally optimal

with respect to N has objective value at least 1/k times that of a globally optimal solution.

The constant k for a problem in GLO gives a natural bound on the approximation performance

of a local search algorithm. We define the locality ratio of a local search algorithm on a given

instance of a combinatorial optimization problem to be the largest value r ∈ [0, 1] such that for

any local optimum S we have f(S) ≥ r ·f(O), where O is a global optimum. Then, the locality

ratio r corresponds to the value 1/k in the definition of GLO. By analogy with approximation

ratios, we define the locality ratio for a problem to be the infimum of the locality ratios of its

instances.

There are several advantages to working with locality ratios. An algorithm’s locality ratio

is determined solely by the potential function g and the neighborhood structure N . However,

it does not depend on the initial solution Sinit or the pivot rule pivot. The locality ratio hence

allows us to compute a lower bound on the approximation ratio of GenLocalSearch without con-

sidering the dynamic behavior of the algorithm, which can be extremely difficult to determine.

For this reason, virtually all analyses of particular local search algorithms are based on the

algorithms’ locality ratios. One noteworthy exception is Chandra and Halldorsson’s analysis

[22] of a greedy local search algorithm for maximum weight independent sets in (k+1)-claw free

graphs. They show that a local search algorithm in which Sinit is chosen greedily and pivot

always chooses the best improved solution attains an approximation ratio of almost 3/2 times

the locality ratio for the problem.

Thus far, we have been using the terms “potential function” and “objective function,”

interchangeably in our discussion of local optimality; that is, we have been assuming that the

potential function g used by GenLocalSearch is simply the problem’s given objective function f .

In independent work, Alimonti [2, 3, 4] and Khanna et al. [63, 64] introduce the notion3 of non-

oblivious local search, in which the potential function used to guide the local search procedure

is different from the problem’s stated objective function, f (in contrast, they call variants of

3The (unfortunate) terminology “non-oblivious local search” is due to Khanna et al.


local search in which g = f oblivious local search algorithms). They show that non-oblivious

techniques yield improved locality ratios for a variety of problems. Note that we always require

local optimality with respect to g but state locality ratios in terms of f . That is, a problem

has locality ratio r for some non-oblivious local search algorithm of the form GenLocalSearch if

every local optimum S with respect to g satisfies f(S) ≥ r · f(O).

By analogy with GLO, Khanna et al. formulate the class NonObliviousGLO of problems that

have non-zero locality ratios for some non-oblivious potential function. They show that GLO

is a strict subset of NonObliviousGLO; that is, there are problems that have locality ratio 0 for

oblivious local search but some positive locality ratio for non-oblivious local search. Khanna

et al. further prove that every problem in MaxSNP can be approximated to within some con-

stant factor by some non-oblivious local search in which the neighborhood relation N satisfies

d(S, T ) = 1 for all S and all T ∈ N(S), where d is the Hamming distance between solutions S

and T .

Despite the apparent relative power of non-oblivious local search, there has been little

application or systematic study of it since these first results. Berman [11] gives a non-oblivious

local search algorithm for the weighted independent set problem in (k+1)-claw free graphs (we

discuss this algorithm further in Chapters 5 and 6). Berman and Krysta [12] further consider

a generalization of this algorithm in which the weights are raised to some power between 1

and 2. Finally, some of the local search approaches to facility location problems [6, 23] make

use of weight scaling to improve the approximation performance of the algorithm. Although it

is not presented as such, the resulting algorithm essentially employs a non-oblivious potential

function.

In this thesis, we revisit non-oblivious local search. We apply the technique in several new

areas, including submodular maximization, and obtain improved approximation for a variety

of problems. Even in cases where non-oblivious techniques merely match the performance of

existing approximation algorithms, they yield combinatorial algorithms that are significantly

simpler than existing approaches. In contrast, we show that variants of oblivious local search

for these problems give diminishing returns even when they are allowed to consider much larger

neighborhoods than their non-oblivious counterparts.

1.3 Our Contributions

We now outline the main contributions presented in the thesis.

• In Chapter 3, we give a new, combinatorial algorithm for the problem of maximizing a

coverage function subject to a matroid constraint. Coverage functions, defined in Section

3.1, are a particular class of submodular functions possessing a succinct, explicit represen-

tation. Our non-oblivious algorithm makes use of a special, weighted potential function,

whose weights were derived by solving a family of linear programs. In addition to stating

the general formula for this function and proving that it yields an improved locality ratio,


we give some details of the experimental approach used to derive it.

Our non-oblivious algorithm is a (1 − 1e )-approximation, which is optimal under the as-

sumption P 6= NP as well as in the value oracle setting. Our algorithm matches the approx-

imation performance of the continuous greedy algorithm described in Section 2.5, which

applies in the more general setting of maximizing any monotone submodular function

subject to a matroid constraint. However, our algorithm is simpler and more straightfor-

ward than the continuous greedy algorithm, and is completely combinatorial. In contrast

to our non-oblivious local search algorithm, we show that oblivious local search has a

locality ratio of only 12 + ε, even when allowed to search much larger neighborhoods than

our algorithm.

This chapter is based on joint work with Yuval Filmus, appearing in [40].

• In Chapter 4, we turn to the general problem of maximizing any monotone submodu-

lar function subject to a matroid constraint. We expand the non-oblivious approach of

Chapter 3 to this setting, matching the general applicability of the continuous greedy

algorithm, as well as its approximation performance. Again, our algorithm is simple

to state and combinatorial. The results of Chapter 3 make crucial use of the succinct

representation available for coverage functions, and here we do not have access to such

a representation. Thus, the techniques required for the general submodular case are a

non-trivial extension of the maximum coverage case. Again, we provide a complete con-

struction and analysis for our non-oblivious potential function as well as the details of its

derivation. Additionally, we show that our construction produces the same non-oblivious

potential function as that in Chapter 3 when applied to a submodular function that is

a coverage function. Unlike the algorithm of Chapter 3, however, our general algorithm

requires randomization. Specifically, it employs a sampling procedure to compute our

potential function efficiently.

Our algorithm is a (1− 1e )-approximation for monotone submodular maximization subject

to a matroid constraint. Moreover, if the total curvature of the submodular function is at

most c, our algorithm is a(

1−e−cc

)-approximation. Even in this specific case, our result

matches performance of the continuous greedy algorithm, and is the best possible in the

value oracle model.

This chapter is based on joint work with Yuval Filmus, appearing in [41, 42].

• In Chapters 5 and 6, we consider the problem of linear and monotone submodular maxi-

mization in larger classes of set systems. There is a wealth of research deriving set systems

that capture problems for which the greedy algorithm attains some constant approxima-

tion ratio, but there are no such results for local search algorithms.

In Chapter 5, we introduce a new class of set systems called k-exchange systems, and

show that they capture combinatorial optimization problems for which oblivious local


search attains a 1k -approximation. We prove several results relating k-exchange systems to

existing classes of set systems. Finally, we show that a variety of well-known combinatorial

optimization problems give rise to k-exchange systems.

In Chapter 6, we consider non-oblivious local search for k-exchange systems. We extend

a simple algorithm based on an existing approach [11] for the weighted independent set

problem in (k+ 1)-claw free graphs to all k-exchange systems. Moreover, we show how to

generalize this approach to the case of monotone submodular objective functions. Because

the approach for the weighted case makes crucial use of the objective function’s weighted

representation, our generalization is non-trivial. We obtain 2k+1 - and 2

k+3 -approximations

for maximizing linear and monotone submodular objective functions, respectively, in k-

exchange systems. This provides improved approximations in both the general case and

for several specific problems.

These chapters are based on work appearing in [97] and joint work with Feldman, Naor,

Schwartz in [39]. This latter paper was merged from separate submissions by myself

and the listed authors. Unless otherwise noted, I present only my own independent

contributions here.

• In Chapter 7, we prove a variety of negative results for oblivious local search in the

general setting of Boolean constraint satisfaction problems. Specifically, we consider the

performance of oblivious local search both when the neighborhood size is increased, and

when the initial solution is chosen randomly or via a simple, greedy algorithm.

The first set of results consider the h(n)-local search algorithm that at each step changes

the assignment to at most h(n) variables for some function h depending on the total

number n of variables. We show that if a constraint satisfaction problem possesses an

instance with a particular kind of local optimum under 1-local search, then this instance’s

locality ratio is an upper bound on the locality ratio of h(n)-local search for all h = o(n).

Moreover, even if h(n) = cn for some small value c, the locality ratio for the problem

remains strictly less than 1. Note that in this case we are allowing the local search

algorithm to examine an exponential number of solutions in each iteration.

The second set of results considers the particular CSP MaxCut. The bounds in this prob-

lem differ from our other results in that we consider the effects of the initial solution Sinit

and the pivot rule pivot used to define the oblivious local search algorithm. Thus, we

consider the actual dynamic behavior of the algorithm and directly bound its approxi-

mation ratio, rather than simply considering its locality ratio. We show that there are

instances for which a local search algorithm that chooses its initial solution Sinit uniformly

at random has expected approximation ratio at most 3/4. This bound is less than the

0.878-approximation produced by Goemans and Williamson [45] and holds even in the

case that the local search algorithm has access to an arbitrarily powerful oracle for the

function pivot that chooses an improved solution at each step. If pivot is implemented


by a greedy rule that always chooses the best available improvement, we can improve our

bound to 1/2, showing that the randomly initialized best-improvement local search has

an expected approximation ratio no better than the locality ratio for deterministic 1-local

search. All of our results hold generally for any h(n)-local search in which h = o(n).

Moreover, we derive non-trivial bounds even in the case that h(n) = cn for c < 1/2. Fi-

nally, we show that choosing Sinit by using the greedy algorithm can result in a worst-case

local optimum and so cannot attain an approximation ratio beyond the locality ratio for

the problem.

Chapter 2

Preliminaries

In this chapter, we review some definitions regarding linear and submodular functions, inde-

pendence systems, matroids, and related algorithms. We present relevant work in the area, and

give some general theorems that will prove useful in later sections. We begin by establishing

some standard notational conventions in Section 2.1. In Section 2.2 we consider two particular

classes of objective functions f and identify some useful extra properties of such functions. In

Section 2.3, we examine restrictions on the structure of the collection F of feasible solutions.

Finally, in Sections 2.4, 2.5, and 2.6, we review known algorithms for solving the resulting

classes of combinatorial optimization problems.

2.1 Notation

We begin by describing the notational conventions used in the thesis.

2.1.1 Sets

Throughout, we shall (with a few exceptions) use lowercase letters to denote single values

or elements, uppercase letters to denote sets of elements, and calligraphic letters to denote

collections of sets. We use the following special notations related to sets:

• R≥0 denotes the set of non-negative real numbers.

• N denotes the set of natural numbers.

• For an integer n, [n] denotes the set 1, . . . , n.

• For a set S and element x, we use the shorthand S + x for the set S ∪ x and the

shorthand S − x for the set S \ x.

• For a set S, 2S denotes the set of all subsets of S.

• For a set S and an integer k, we denote by(Sk

)the collection of all subsets of S containing

exactly k elements.

10

Chapter 2. Preliminaries 11

2.1.2 Probability

• For a condition (or event) C, 1(C) denotes the indicator that is 1 when C is true and 0

otherwise.

• For a random event E and some explicitly given probability distribution, Pr [E] denotes

the probability that E will occur.

• For a variable x, a function f , and a set of values S, Ex∈S [f(x)] denotes the expected

value of f(x) when the value x is chosen uniformly at random from S.

• When the probability distribution of a random variable X has been explicitly stated, and

there is no chance of confusion, we shall write E[X] the expected value of X with respect

to this distribution.

2.1.3 Miscellaneous

• We use Hk to denote the kth harmonic number, given by

Hk =k∑i=1

1

k.

Note that the sequence H1, H2, . . . is increasing. We shall also make use of the well-known

fact that Hk = Θ(log k).

• We use the notation f = O(g), to indicate that f has the same asymptotic rate of growth

as g when poly-logarithmic factors are ignored.

2.2 Linear and Submodular Functions

Perhaps the simplest useful class of objective functions f are linear functions.

Definition 2.1 (Linear Function). A function f : 2X → R≥0 is linear if

f(A) + f(B) = f(A ∪B) + f(A ∩B)

for all A,B ⊆ X.

A linear function f can always be represented in terms of a weight function w : X → R≥0

that assigns each element of x ∈ X a non-negative weight w(x). The value f(S) is then given

by the total weight∑

x∈S w(x) of all elements in S.

If we relax the equality in Definition 2.1 we obtain the class of submodular functions.

Definition 2.2 (Submodular Function). A function f : 2X → R≥0 is submodular if

f(A) + f(B) ≥ f(A ∪B) + f(A ∩B)


for all A,B ⊆ X.

For a submodular function f and a set A, we define the marginal gain with respect to S of

an element x ∈ X \A as

fS(x) = f(S + x)− f(S) .

The notion of an element’s marginal gain is roughly analogous to the notion of an element’s

weight in the linear case. However, in the submodular case, an element can have different

marginal gains with respect to different sets.

Nemhauser, Wolsey, and Fisher study various aspects of combinatorial optimization prob-

lems in a pair of papers [76, 43]. Among other things, they show that the following properties

are equivalent characterizations of submodularity:

Definition 2.3 (Submodular Function (Alternative Characterizations)). Consider a function

f : 2X → R≥0. Then, the following statements are each equivalent to the statement that f is

submodular:

(i) fA(x) ≤ fB(x) whenever B ⊆ A ⊆ X and x 6∈ A.

(ii) f(A+ x) + f(A+ y) ≥ f(A ∪ x, y) + f(A) for all A ⊆ X and x, y 6∈ A.

These characterizations essentially state that submodular functions are characterized by

decreasing marginal gains. Thus, submodularity can be viewed as a discrete analogue of con-

cavity. Furthermore, the concept of decreasing marginal gains lends itself naturally to many

economic and combinatorial settings, as shown by Nemhauser, Wolsey, and Fisher.

We shall assume that all submodular objective functions f are normalized so that f(∅) = 0.

Note that this condition holds trivially for linear functions. We restrict ourselves further to the

class of monotone submodular functions.

Definition 2.4 (Monotone Function). A function f : 2X → R≥0 is monotone if f(B) ≤ f(A)

for all B ⊆ A ⊆ X.

Note that in a monotone submodular function, all marginal gains are non-negative. In fact,

this property provides an alternative characterization of the class of monotone functions.

Another natural restriction involves how much the marginals of a submodular function are

allowed to decrease. This notion is captured by the curvature of a submodular function.

Definition 2.5 (Total Curvature). A monotone submodular function f has total curvature c

if and only if

f(A ∪B) ≥ f(A) + (1− c)f(B).

for any two disjoint sets A,B.

For c = 1 the definition is equivalent to a statement of monotonicity. In the case that

c = 0, the statement implies that f(A) + f(B) ≤ f(A ∪ B) and from submodularity we have


f(A) + f(B) ≥ f(A∪B) since A and B are disjoint, so in fact f(A∪B) = f(A) + f(B) for all

disjoint A and B. That is, the case c = 0 corresponds to the case in which f is linear. Thus,

the parameter c ∈ [0, 1] smoothly interpolates between the class of all monotone submodular

functions and linear functions.

Finally, we give three useful theorems regarding monotone submodular functions. The first

two are very slight modifications of Lemmas 1.1 and 1.2 in Lee, Sviridenko, and Vondrak [71].

Theorem 2.6. Let f be a monotone submodular function on X. Let C, S ⊆ X, and Tili=1

be a collection of subsets of C \ S such that each element of C \ S appears in at least k of the

subsets Ti. Then,l∑

i=1

[f(S ∪ Ti)− f(S)] ≥ k [f(S ∪ C)− f(S)]

Proof. Fix an arbitrary ordering ≺ on X. For any x ∈ C \ S let Cx be the set of elements in

C that precede x in the ordering. Similarly, let T xi be the set of elements of Ti that precede x.

Then, we have T xi ⊆ Cx for all x. Thus,∑x∈Ti

fS∪Cx(x) ≤∑x∈Ti

fS∪Txi (x) =∑x∈Ti

f(S ∪ T xi + x)− f(S ∪ T xi ) = f(S ∪ Ti)− f(S) ,

where the first inequality follows from submodularity and the last equality from telescoping the

summation. Now, we have:

l∑i=1

f(S ∪ Ti)− f(S) ≥l∑

i=1

∑x∈Ti

fS∪Cx(x) ≥∑x∈Ti

k · fS∪Cx(x) = k [f(C ∪ S)− f(S)] ,

where the second inequality follows from the fact that each x ∈ S occurs in at least k of the

sets Ti and fS∪Cx(x) ≥ 0 since f is monotone.

Theorem 2.7. Let f be a monotone submodular function on X. Let C, S ⊆ X, and Tili=1

be a collection of subsets of S \ C such that each element of S \ C appears in at most k of the

subsets. Then,l∑

i=1

[f(S)− f(S \ Ti)] ≤ k [f(S)− f(S ∩ C)]

Proof. Fix an arbitrary ordering ≺ on S \C. For any x ∈ S \C let Sx be the set containing x,

and all the elements from S \C that precede x in the ordering. Similarly, let T xi contain x and

all the elements from Ti preceding x. Then, we have T xi ⊆ Sx for all x and so S \ Sx ⊆ S \ T xifor all x. Thus,∑

x∈Ti

fS\Sx(x) ≥∑x∈Ti

fS\Txi (x) =∑x∈Ti

[f((S \ T xi ) + x)− f(S \ T xi )] = f(S)− f(S \ Ti) ,

where the first inequality follows from submodularity and the last equality from telescoping the


summation. Now, we have:

l∑i=1

[f(S)− f(S \ Ti)] ≤l∑

i=1

∑x∈Ti

fS\Sx(x) ≤∑

x∈S\C

k · fS\Sx(x) = k [f(S)− f(S ∩ C)] ,

where the second inequality follows from the fact that each x ∈ S occurs in at most k of the

sets Ti and fS\Sx(x) ≥ 0 since f is monotone.

We primarily use Theorem 2.6 in the restricted setting in which k = 1 and Tili=1 is a partition

of C \ S. The next theorem is another application of Theorem 2.6, involving the average value

of a submodular function on subsets of a particular size.

Theorem 2.8. Let f be a non-negative submodular function, and let S be a set of size m. For

k in the range 1 ≤ k ≤ m,1(mk

) ∑T∈(Sk)

f(T ) ≥ k

mf(S).

Proof. Each element x ∈ S appears in exactly(m−1k−1

)= k

m

(mk

)of the sets in

(Sk

). From Theorem

2.6, and the assumption that f(∅) = 0, we then have:

∑T∈(Sk)

f(T ) =∑T∈(Sk)

[f(T )− f(∅)] ≥ k

m

(m

k

)[f(S)− f(∅)] =

k

m

(m

k

)f(S) .

2.3 Independence Systems and Matroids

In this section, we consider various classes of feasible solutions for combinatorial optimization

problems. Our classes are all built on the notion of an independence system. An independence

system is given by a ground set X and a non-empty, downward closed collection I of subsets

of X:

Definition 2.9 (Independence System). Let X be a set of elements and I ⊆ 2X . Then the set

system (X, I) is an independence system if and only if I 6= ∅ and for all A,B ⊆ X, A ∈ I and

B ⊆ A implies B ∈ I.

We refer to the sets in I as independent sets. For a given set A ⊆ X we call the inclusion-

wise maximal independent subsets of A bases of A, or, when A is understood to be X, simply

bases. Finally, when dealing with independence systems we assume that every element x ∈ X is

contained in at least one independent set A ∈ I. This assumption is without loss of generality

since if some element does not occur in any independent set of I, we can remove it from

the ground set X without affecting the set I of feasible solutions. Furthermore, because I is

downward closed, this assumption is equivalent to the assumption that x ∈ I for all x ∈ X.

While the class of all independence systems is too general to give rise to interesting algorith-

mic and combinatorial properties, it does serve as the basis for several more restricted classes of


set systems which do exhibit interesting properties. Probably the most well known such class,

is the class of matroids.

Matroids were first axiomatized by Whitney [98] as a generalization of the notion of linear

independence in vector spaces.

Definition 2.10 (Matroid [98] (also [85, (39.1)])). An independence system (X, I) is a matroid

if and only if for all A,B ∈ I if |A| > |B| then there exists x ∈ A \B such that B + x ∈ I.

The following are some simple classes of matroids that we will refer to later in the thesis. In

a uniform matroid of rank k, I consists of precisely those sets of size at most k. In a partition

matroid we are given a partition of X into p sets X1, . . . , Xp, and integers k1, . . . , kp. Then, Icontains precisely those sets S for which |S ∩Xi| ≤ ki for all 1 ≤ i ≤ p. In a graphic matroid,

we are given an undirected graph G = (V,E). The ground set is E and I contains precisely

those sets of edges that do not contain a cycle.

There are various alternate characterizations for the class of matroids. Two that shall be

useful are the following, which are given in terms of bases.

Theorem 2.11 ([85, (39.2)]). An independence system (X, I) is a matroid if and only if for

all E ⊆ X, all bases of E have the same size.

The common size of all bases of a set E is called the rank of E, denoted rank(E). The rank

of the matroid (X, I) is then simply the rank of X.

Whitney [98] also gave the following alternate characterization of matroids in terms of bases.

Theorem 2.12 ([98] (also [85, Theorem 39.6])). Let B be a non-empty collection of subsets of

X. Then, B is the collection of bases of a matroid if and only if:

(i) For any A,B ∈ B and x ∈ A \B, there exists y ∈ B \A such that A− x+ y ∈ B.

(ii) For any A,B ∈ B and x ∈ A \B, there exists y ∈ B \A such that B − y + x ∈ B.

Brualdi [16] shows that matroids exhibit the following stronger exchange properties.

Theorem 2.13 ([16, Theorem 1] (also [85, Corollary 39.12a])). Let A,B be bases of a matroid

M. Then, there exists a bijection π : A → B such that B − π(x) + x is a base of M for all

x ∈ A. Furthermore, π(x) = x for all x ∈ A ∩B.

Theorem 2.14 ([16, Theorem 2] (also [85, Theorem 39.12])). Let A,B be bases of a matroid

M. Then, for any x ∈ A there exists some y ∈ B such that A− x+ y and B − y + x are both

bases of M.

Theorems 2.11 and 2.13 (i.e. the fact that all bases of a matroid have equal size and the

existence of the bijection π) are typically all that we need from the structure of a matroid in

order to derive our results.

Ideally, we would like for the bijection π from 2.13 to satisfy the stronger conditions of 2.14

(i.e. to have a single bijection π : A → B such that both B − π(x) + x and A − x + π(x)


are bases). Brualdi [16] gives an example of a matroid for which this cannot be done. Later

work by Brualdi and Scrimger [19, 17, 18] generalizes the base exchange characterization of

Theorem 2.12 to consider weakly base orderable matroids, in which a bijection π satisfying this

stronger property does exist. They also define the following class of matroids, that exhibit an

even stronger sort of exchange property.

Definition 2.15 (Strongly Base Orderable Matroid). A matroid is strongly base orderable if

for any pair A,B of its bases there exists a bijection π : A → B such that for all C ⊆ A,

(B \ π(C)) ∪ C is a base (where π(C) = π(x) : x ∈ C.

That is, in a strongly base orderable matroid, the bijection π : A→ B between any two bases

A to B can be extended to subsets of A and B. Essentially, this means that in any strongly base

orderable matroid, several pairs of swaps (each of the x, π(x)) can be performed simultaneously.

The class of strongly base orderable matroids is quite large, containing gammoids, transversal

matroids, and partition matroids.1 An example of a matroid that is not strongly base orderable

is the graphic matroid on K4.

A final useful characterization of matroids is the following, which relates them to submodular

functions:

Theorem 2.16 ([98] (also [85, Theorem 39.8]). Let rank : 2X → Z≥0. Then rank is the rank

function of a matroid if and only if:

(i) rank(T ) ≤ rank(U) ≤ |U |, for all T ⊆ U ⊆ X.

(ii) rank(T ) + rank(U) ≥ rank(T ∪ U) + rank(T ∩ U), for all T,U ⊆ X.

That is, rank() is the rank function of a matroid if and only if rank() is monotone sub-

modular. A rank function on X implicitly specifies the independence system (X, I), with

I = S ⊆ X : |S| ≤ rank(S).

2.4 The Greedy Algorithm

We now examine in more detail combinatorial optimization problems whose feasible sets are

given by independence systems. In such problems, we are given an independence system (X, I)

and a function f : 2X → R≥0. The goal is to find a set S ∈ I that maximizes the value f(S).

First, let us consider the case in which f is a linear function, given as a weight function

w : X → R≥0. Then, the related combinatorial optimization problem is equivalent to the

problem of finding an independent set in I of maximum total weight. Rado [81] showed that

the standard greedy algorithm, shown in Algorithm 2, is optimal for all linear functions f

whenever (X, I) is a matroid. Conversely, Edmonds [33] showed that if the standard greedy

algorithm provides an optimal solution in I for every linear function f on 2X , then the system

1Each of these classes is contained within the next. We do not provide definitions for gammoids and transversalmatroids here.


Algorithm 2: Greedy

Input: Independence system (X, I)Weight function w : X → R≥0

S ← ∅;T ← X;while T 6= ∅ do

x← arg maxt∈T w(t);T ← T \ x;if S ∪ x ∈ I then

S ← S ∪ x;

return S;

(X, I) must be a matroid. In this sense, matroids are exactly those independence systems for

which the standard greedy algorithm is optimal with respect to all linear functions.

Now, let us examine the more general case in which f is any monotone submodular function.

The earliest reference to the problem of maximizing a submodular set function subject to

a matroid constraint seems to be Cornuejols, Fisher, and Nemhauser [30]. They consider

a constrained maximization variant of a facility location problem that is a special case of

monotone submodular maximization subject to a uniform matroid constraint. They show that

a greedy algorithm is a 1− 1/e-approximation algorithm for this problem, while a simple local

search algorithm is only a 1/2-approximation. Fisher, Nemhauser, and Wolsey [43] consider the

general case of maximizing an arbitrary monotone submodular function subject to an arbitrary

matroid constraint. The standard greedy algorithm that they consider for the problem is

shown in Algorithm 3. Algorithm 3 is obtained naturally by modifying Algorithm 2 to use

Algorithm 3: SubmodularGreedy

Input: Independence system (X, I)Submodular function f : 2X → R≥0

S ← ∅;T ← X;while T 6= ∅ do

x← arg maxt∈T fS(t);T ← T \ x;if S ∪ x ∈ I then

S ← S ∪ x;

return S;

the marginal gains fS with respect to the current solution in place of the weight function w.

Fisher et al. show that SubmodularGreedy is a 1/2-approximation and that this bound is tight.

They also show that a simple 1-local search algorithm is a 1/2-approximation (again, they give

an example showing that this bound is tight). Many results pertaining to the greedy algorithm


for submodular maximization are summarized in a survey of Goundan and Schulz [46].

From a hardness perspective, Feige [35] showed that unless P = NP, it is impossible to

approximate the following maximum k-coverage problem beyond a factor of 1− 1/e. In maxi-

mum k-coverage, we are given a family of subsets of some universe U and must select k subsets

that cover as much of U as possible. The coverage function is monotone submodular and the

constraint that we can take at most k sets can be formulated as a uniform matroid of rank k.

Thus, maximum k-coverage is a special case of maximizing a monotone submodular function

subject to a matroid constraint.

Nemhauser and Wolsey [75] considered the problem of monotone submodular maximization

subject to a matroid constraint in the value oracle model. In this model we are given f via

an oracle that provides its value on any given set. Nemhauser and Wolsey show that attaining

any approximation better than 1− 1/e requires an exponential number of value queries to the

oracle for f .

2.5 The Continuous Greedy Algorithm

Calinescu, Chekuri, Pal and Vondrak [20, 93, 21] improved on the long-standing 1/2-approximation,

giving an algorithm that attains the optimal approximation ratio of 1−1/e for the general prob-

lem of maximizing any monotone submodular function subject to a single matroid constraint.

Their algorithm, called the continuous greedy algorithm, consists of two phases. In the first

phase, they solve a particular relaxation of the combinatorial optimization problem to obtain

an approximate fractional solution. In the second phase, this fractional solution is rounded to

an integral solution of the original problem by using the pipage rounding framework of Ageev

and Sviridenko [1].

We now consider the continuous greedy algorithm in more detail. Let f be a monotone

submodular function on X and let M = (X, I) be a matroid on X. The continuous greedy

algorithm considers the following continuous, multilinear extension of f , where ~x ∈ [0, 1]X , is a

vector with a component xi ∈ [0, 1] for each i ∈ X:

F (~x) =∑R⊆X

f(R)∏i∈R

xi∏i 6∈R

(1− xi) . (2.1)

In the general setting, in which f is given by a value oracle, F cannot be computed in polynomial

time, but it can be efficiently estimated by random sampling.

The value F (~x) can be viewed as the expected value of a random subset of X in which each

element i appears with probability xi. We identify an integral vector ~z ∈ 0, 1X with the set

Z whose indicator vector is χZ = ~z. Then, for any vector ~z ∈ 0, 1X , we have F (~z) = f(~z)

and so F is indeed a relaxation of f . Furthermore, (as shown by Calinescu et al. [20]) the


monotonicity and submodularity of f imply, respectively, that

∂F

∂xi≥ 0 for all i ∈ X (2.2)

∂2F

∂xi∂xj≤ 0 for all i, j ∈ X with i 6= j . (2.3)

The continuous greedy algorithm considers the relaxed problem of maximizing F (~x) subject

to the constraint that ~x is in the polytope P (M) of M:

P (M) = ~x ∈ [0, 1]X : ~x(S) ≤ rank(S), ∀S ⊆ X

Here, ~x(S) =∑

i∈S xi. Alternatively (as shown in Edmonds [33]), the polytope P (M) is simply

the convex hull of the characteristic vectors of sets in I. The continuous greedy algorithm solves

the problem

maxF (~x) : ~x ∈ P (M) .

Due to the structure of F , this is a non-linear optimization problem. However, a (1 − 1/e)-

approximation is attained by a simple continuous algorithm. The algorithm follows a trajectory

~x(t) for t ∈ [0, 1], with ~x(0) = 0 initially. The trajectory satisfies the following differential

equationd~x(t)

dt= ~vmax(~x(t))

where ~vmax(~x(t)) = arg max~v∈P (~v · ∇F (~x(t))). Intuitively, the equation simply states that at

each instant, the trajectory ~x(t) moves in the direction of the vector that maximizes the increase

in F (given by the gradient vector ∇F (~x(t))) subject to the constraint that this vector is in the

polytope P . The algorithm returns the final value ~y = ~x(1).

In practice, the continuous algorithm is implemented on a suitably discretized time scale.

That is, we split [0, 1] into integral multiples of some ε, so that t takes ε−1 discrete values. At

each discrete time step, t the problem of finding the direction vmax with which to update ~x can

be accomplished by the following procedure. First, the value of ∇F is estimated via a sampling

procedure. For each i ∈ X, we set a weight w(i) equal to the partial derivative ∂F/∂xi given

by the ith coordinate of ∇F . Then, the ~vmax can be computed by finding a maximum weight

independent set in the matroid M with weights given by w. This is done via the standard

greedy algorithm. The algorithm updates ~x(t+ ε)← ~x+ ε · ~vmax.

Next, the final fractional solution ~y produced by the first phase is rounded to obtain an

integral solution to the original problem. This is accomplished by a pipage rounding procedure.

From (2.2) we have that F must be increasing in each coordinate. Hence, we can assume that

~y is in fact in the base polytope B(M) of the matroid

B(M) = x ∈ P (M) : x(X) = rank(X) ,


which has as extreme points vectors corresponding to the characteristic vectors of bases of M.

The pipage rounding procedure obtains an integral solution by repeatedly altering the co-

ordinates of ~y. Let ei denote the incidence vector χi (that is, ei is 1 in the coordinate

corresponding to i and 0 in all other coordinates). Each alteration is of the form

~y + t(ei − ej) , (2.4)

where t may be either positive or negative. Thus, each rounding step consists of increasing

one coordinate and decreasing another by the same amount, while leaving all other coordinates

unchanged.

The pipage rounding approach depends on two facts, upon which we shall now elaborate.

First, while the function:

F ~y~v (t) = ~y + t~v

is not necessarily convex in every direction ~v, it is convex in all directions ei−ej corresponding

to adding an element i and removing an element j. This follows directly from (2.3), which

itself follows from submodularity of f . Second, because M is a matroid, we can move from a

fractional solution ~y to a vertex of B(M) by moving only in these directions. Combined, these

facts imply that the pipage rounding procedure never decreases the value of F , as we shall see.

Unfortunately, this means that pipage rounding does not generalize to other independence sys-

tems, such as those given by intersection of multiple matroids, in which the rounding procedure

must consider directions along which f is not concave.

The rounding maintains the invariant that ~y ∈ B(M) and F (~y) non-decreasing. We call a

set A ⊆ X tight if its corresponding constraint in the matroid polytope P (M) is tight, so that

x(A) = rank(A). Since rank(A) is an integer, any tight set that contains some fractional variable

must contain at least 2 fractional variables. An iteration of the pipage rounding procedure

begins with any pair of fractional values yi, yj ∈ X, and then alters ~y by first increasing t from

0 in (2.4) until some set A+ becomes tight or either yi or yj becomes integral (we denote by t+

the value of t ≥ 0 at which this happens). Next, the procedure decreases t from 0 until some

set A− becomes tight or yi or yj becomes integral (we denote by t− the value of t ≤ 0 for which

this happens. Thus, the vector given by (2.4) is in P (M) for all t ∈ [t−, t+]. Furthermore, it

follows from (2.3) that the function:

F ~yij(t) = ~y + t(ei − ej)

is convex for any i 6= j. Thus, F ~yij attains its maximum value over t ∈ [t−, t+] at either t− or

t+. One of F ~yij(t+) and F ~yij(t

−) is greater than or equal to F (~y), and the rounding procedure’s

choice of t can be made to guarantee that F does not decrease.2

It remains to show how to choose i and j so this procedure will eventually terminate with an

2In [21] a randomized rounding procedure is used that only ensures that the expected value of F is non-decreasing. This procedure has the advantage of not needing to compute the value of F .


integral vector. As shown by Calinescu et al. [20, 21], this depends intimately on the fact that

M is a matroid. We omit the full proof here, but give some intuition in terms of the geometry

of B(M) and a sketch of the proof in terms of the structure of tight sets.

For our geometric intuition, we note that a strengthening of the exchange characterization

of Theorem 2.12 (see Brualdi [16], and also Schrijver [85, Theorem 39.12]) implies that the

adjacent vertices A,B of the base polytope B(M) have the property that both A + j − i and

B − j + i are bases for some i ∈ A \ B and j ∈ B \ i [85, Theorem 40.6]. This implies that

it is possible to move from an initial fractional solution ~y ∈ B(M) to any vertex of B(M) by

following only vectors of the form t(ei − ej). These vectors are precisely the ones considered

by the pipage rounding algorithm.

More formally, Theorem 2.16 states that the rank function rank ofM must be submodular.

It follows (see [21] for a proof) that if A and B are two tight sets then so are A∩B and A∪B.

The pipage rounding procedure begins by choosing any pair of fractional values yi, yj and a

tight set T containing i and j (initially T = X). Suppose that after one iteration neither yi

nor yj has become integral. Then, there is some new set A (equal to either A+ or A−) which is

tight with respect to the updated vector ~y + t(ei − ej) corresponding to the algorithm’s choice

of t. But, A and T are both tight and A 6= S, so the set T ∩ A is also tight. Furthermore,

T ∩A contains either i or j, and yi and yj are fractional. Thus, T ∩A must contain at least 2

fractional variables. The pipage rounding procedure updates T to be the set T ∩A and chooses

a pair i and j from T ∩A such that yi and yj are fractional, then continues. The new tight set

T ∩A is smaller in the next iteration and so eventually the algorithm must find a set containing

only i, j in which case it can make one of i,j integral. The procedure then repeats until all

variables are integral.

In practice, finding a tight set can be implemented as the submodular minimization problem:

arg minA∈A

(rank(A)− y(A)) (2.5)

where A = S ⊆ X : i ∈ S, j 6∈ S. In contrast to submodular maximization, submodular

minimization can be accomplished in polynomial time [84, 56, 96]. Because the submodular

minimization problem (2.5) is of a particular restricted sort, a faster algorithm by Cunningham

[31] can be applied.

In further work, Vondrak [95] considers the problem of maximizing a monotone submodular

function whose total curvature is at most c. He shows that the continuous greedy algorithm is

a (1 − ec)/c-approximation, and that it is impossible to attain a better approximation with a

polynomial number of queries in the value oracle model. Previously, Conforti and Cornuejols [29]

had shown that the standard greedy algorithm attains an approximation ratio of (1− ec)/c for

monotone submodular maximization in uniform matroids when the function has total curvature

at most c. In the case of an arbitrary matroid, however, they showed that the standard greedy

algorithm is only a 1/(1 + c)-approximation.


In a series of papers, Chekuri, Vondrak, and Zenklusen [25, 26, 27] develop a new rounding

technique called swap rounding, which can be used in place of pipage rounding in the continuous

greedy algorithm. By using the multilinear relaxation F together with their new rounding

procedure they derive improved approximations for a variety of problems involving matroid

constraints together with various other constraints. They also consider the case of matroid

intersection and give an improved approximation algorithm for budgeted linear variants of the

problem and some restricted types of monotone submodular functions.

The multilinear relaxation F has been used in a variety of other settings as well, including

non-monotone [69] and multi-budgeted maximization [47]. Vondrak [94] uses F to formulate the

“symmetry gap” of a problem, which he uses to unify several negative results in the value oracle

model. Oveis-Gharan and Vondrak [44] use the relaxation F to design an algorithm based on

simulated annealing. They show that the resulting algorithm gives improved approximations

for non-monotone submodular maximization both in the unconstrained and matroid settings.

Finally, Feldman, Naor, and Schwartz give an adapatation of the continuous greedy algorithm

that gives further improved approximations for non-monotone submodular maximization [36].

In follow up work [37], they unify this and a variety of other applications of the general contin-

uous greedy approach.

2.6 Partial Enumeration

In this section, we give a meta-algorithm that can be used to obtain a small improvement in

the approximation performance of any algorithm A for maximizing a submodular function in

an independence system. We use a partial enumeration technique similar to that used by Sahni

[82] for the 0/1 knapsack problem. This approach is further described by Khuller et al. [65]

and Sviridenko [89] in the context of budgeted maximum coverage and Calinescu et al. [20] for

the general problem of submodular maximization subject to a matroid constraint. Effectively,

we “guess” a single element of the optimal solution, and then run A on a modified instance in

which all solutions contain this element. We then iterate over all possible guesses.

The technique is based on the notion of contraction, which we now define. Our definition

is inspired by the standard notion of contraction in matroids [85, Section 39.3]. Note that here

we consider contraction in an arbitrary independence system (not necessarily a matroid).

Definition 2.17 (Contraction). Let (X, I) be an independence system, and x ∈ X. We define

the set system (X − x, Ix) by Ix = A ⊆ X : A ∪ x ∈ I (i.e. a set A is in Ix if and only

A ∪ x is independent in the original set system).

In order to use the partial enumeration technique for a class of independence systems, we

require that the class be closed under contraction. First, we show that the contraction of an

independence system is an independence system.

Theorem 2.18. Suppose (X, I) is an independence system and x ∈ X. Then, (X − x, Ix) is

also an independence system.


Proof. Recall that for any set system (X, I), we assume that x ∈ I for all x ∈ X. Thus,

∅ ∈ Ix for all x ∈ X. Furthermore, for any x ∈ X, if A ∈ Ix, and B ⊆ A ⊆ X − x then

B + x ⊆ A+ x ∈ I. Because I is downward-closed, B + x ∈ I and so B ∈ Ix.

We now show that the subclass of matroids is also closed under contraction.

Theorem 2.19. Suppose that (X, I) is a matroid and x ∈ X. Then (X − x, Ix) is a matroid.

Proof. Theorem 2.18 shows that (X−x, Ix) must be an independence system. Thus, it suffices

to show that (X−x, Ix) satisfies the condition of Definition 2.10. Let A,B ∈ Ix with |A| > |B|.Then, A + x and B + x must be in I. We have |A + x| = |A| + 1 and |B + x| = |B| + 1, so

|A+x| > |B+x|. Because (X, I) is a matroid, there must be some element y ∈ (A+x)\(B+x)

such that (B + x) + y ∈ I. But, this means that we must have B + y ∈ Ix where y ∈ B \ Aand so (X − x, Ix is a matroid.

We also need a similar notion of contraction for functions.

Definition 2.20 (Function Contraction). Let f be a submodular function on ground set X

and let x ∈ X. We define (with slight abuse of notation) the contracted function fx on X − xas

fx(A) = f(A+ x)− f(x) .

The next theorem shows that the class of monotone submodular functions is closed under

contraction.

Theorem 2.21. If f is (monotone) submodular, then so is fx.

Proof. Suppose f is submodular and let A,B ⊆ X − x. Then,

fx(A) + fx(B) = f(A+ x) + f(B + x)− 2f(x)

≥ f((A+ x) ∪ (B + x)) + f((A+ x) ∩ (B + x))− 2f(x)

= f((A ∪B) + x) + f((A ∩B) + x)− 2f(x)

= fx(A ∪B) + fx(A ∩B).

Suppose that f is monotone and let A ⊆ B ⊆ X − x. Then, since A+ x ⊆ B + x we have

fx(A) = f(A+ x)− f(x) ≤ f(B + x)− f(x) = fx(B).

Now, we are ready to formulate this section’s main algorithmic result. Suppose that A is an

algorithm for monotone submodular maximization in some class of independence systems that

is closed under contraction. We consider the meta-algorithm PartEnum(A), shown in Algorithm

4. When applied to an instance (X, I) the meta-algorithm simply runs A on the contracted

instance (X − x, Ix), fx for each x ∈ X, and returns the best resulting solution.


Algorithm 4: PartEnum(A)Input: independence system (X, I)

submodular function fforeach x ∈ X do

Let Sx be the result of running A on (X − x, Ix), fx;

Let y = arg maxx∈X f(Sx ∪ x);return Sy ∪ y;

Theorem 2.22. Suppose that Algorithm A is a θ-approximation for submodular maximization

over some class of independence systems closed under contraction. Then PartEnum(A) is a

( 1n+ n−1

n θ)-approximation for the same problem, where n is the maximum size of an independent

set in the given set system.

Proof. Consider an instance (X, I), f . Let O be an optimal solution for this instance, and

y = arg maxx∈O f(x). Submodularity implies that

f(O) ≤∑x∈O

f(x) ≤ |O|f(y) ≤ nf(y) .

Furthermore, since A is a θ-approximation algorithm,

fy(Sy) ≥ θfy(O \ y) = f(O) .

Let S be the solution produced by Algorithm 4 on the instance (X, I), f . Then,

f(S) = f(y) + fy(Sy)

≥ f(y) + θfy(O − y)

= f(y) + θ(f(O)− f(y))

= (1− θ)f(y) + θf(O)

≥(

1− θn

+ θ

)f(O) =

(1

n+n− 1

nθ

)f(O) .

Chapter 3

Maximum Coverage

In this section, we derive a deterministic, combinatorial algorithm for maximizing a special

type of monotone submodular function subject to a single matroid constraint. We consider the

case of arbitrary monotone submodular functions in the next chapter. As this special case is

significantly simpler, we present it first in the hope that it will provide some intuition for the

general submodular case. Also, our algorithm in this case is faster and deterministic.

3.1 The Problem

First, let us define the special class of monotone submodular functions that we consider.

Definition 3.1 (Coverage Function). Suppose we are given:

• a universe U of elements

• a weight function w : U → R≥0 assigning each element of U a weight

• a collection1 F = F1, . . . , Fn ⊆ 2U of subsets of U

Then, the coverage function f : 2[n] → R≥0 associated with U , w, and F is defined by

f(S) = w

(⋃i∈S

Fi

).

Where, as usual, we denote by w(A) the total weight∑

x∈Aw(x). Then, f(S) is the total

weight of elements covered by sets Fi, where i ∈ S. We write f = (U,w,F) to state that f is

the coverage function specified by U , w, and F .

1We allow for the case that some set may appear multiple times in F . Such sets are obviously redundant inthe unconstrained setting, but not when we consider maximum coverage subject to a matroid constraint, as weshall describe shortly. Nonetheless, all of the negative results that we state hold even in the case that F containsonly disjoint sets.

25

Chapter 3. Maximum Coverage 26

We consider the problem of matroid maximum coverage, in which we are given a coverage

function f of the form in Definition 3.1, and a matroid M = ([n], I) over [n]. We seek an

independent collection of indices S ∈ I such that f(S) is maximized (i.e. so that the total

weight of elements of U covered by the sets in Fi : i ∈ S corresponding to S is maximized).

We shall use the following terms to measure the complexity of our algorithm:

n = |F| u = maxA∈F|A| r = rank(M) .

That is, n is the number of sets defining f , and also the size of the ground set of M, u is the

maximum size of a set in F (note that u ≤ |U |), and r is the rank of M (note that r ≤ n).

Maximum coverage was first defined by Hochbaum and Pathria [54] in 1998. Maximum

coverage over a partition matroid was considered by Chekuri and Kumar [24] under the name

maximum coverage with group budget constraints. Algorithms for maximum coverage with

group budget constraints with the optimal approximation ratio 1 − 1/e were presented by

Srinivasan [88] and by Ageev and Sviridenko [1]. Both algorithms first formulate the problem

as a linear programming relaxation, and then round the solution. Other generalizations of

maximum coverage appear in the literature. In budgeted maximum coverage, each element is

provided with a cost, and the sets chosen by the algorithm are restricted by the total cost of

the elements they cover. Khuller, Moss and Naor [65] show that a greedy approach yields an

optimal (1−1/e)-approximation algorithm. Cohen and Katzir [28] generalize this even further,

obtaining the generalized maximum coverage problem. In this problem, elements have different

costs and profits depending on which set they are covered with and the sets themselves also

have costs. The goal is to maximize the total profit subject to a single budget constraint. Cohen

and Katzir provide an optimal (1− 1/e− ε)-approximation generalized maximum coverage via

a greedy-like algorithm. As noted in Chapter 2, the (1− 1/e) inapproximability result of Feige

[35] applies even in the special case of maximum coverage subject to a cardinality constraint.

Our algorithmic starting point is the oblivious k-local search algorithm, k-LocalCoverage,

shown in Algorithm 5. The algorithm starts from a base S obtained by the standard greedy

algorithm and repeatedly attempts to improve the total weight of all elements covered by S by

adding up to k sets to S and removing up to k sets from S, maintaining the matroid constraint.

We call a pair (A,R) a k-exchange for S if A ⊆ [n] \ S and R ⊆ S with |A| ≤ k and |R| ≤ k.

When no single k-exchange improves the weight of all elements covered by S, the algorithm

returns S.

We begin by showing that the locality ratio of k-LocalCoverage is no better than the ap-

proximation ratio attained by the greedy algorithm (as r →∞).

Theorem 3.2. For all r > k and ε > 0, there exists an instance I(r, k, ε) of matroid maximum

coverage for which the locality ratio of k-LocalCoverage is at most (1 + ε) (r−1)2r−k−1 . Furthermore,

the greedy solution produced by SubmodularGreedy is a local optimum attaining this bound.

Proof. Let the coverage function f = (U,w,F) for instance I(r, k, ε) be as follows. The universe


Algorithm 5: k-LocalCoverage

Input: Coverage function f = (U,w,F), where |F| = nMatroid M = ([n], I) as an independence oracle for I

Let Sinit be the result of running SubmodularGreedy on f , M;S ← Sinit;repeat

foreach A ⊆ [n] \ S, with |A| ≤ k doforeach R ⊆ S, with |R| ≤ k do

if (S \R) ∪A ∈ I and f((S \R) ∪A) > f(S) thenS ← (S \R) ∪A;

until no exchange is applied to S;return S;

U consists of r − 1 elements x1, . . . , xr−1 and r − k elements y1, . . . , yr−k, all of weight 1,

together with r−1 elements ε1, . . . , εr−1 of arbitrarily small weight ε > 0. For each 1 ≤ i ≤ r,we suppose that the ground set [2r] contains 2 distinct elements that we label ai and bi. Then,

F is made up of the sets Ai = Fai and Bi = Fbi for each 1 ≤ i ≤ r, defined as follows:

Ai = εi for 1 ≤ i ≤ r − 1, Ar = x1, . . . , xr−1,

Bi = xi for 1 ≤ i ≤ r − 1, Br = y1, . . . , yr−k.

Then, |F| = 2r. The matroid for I(r, k, ε) is a partition matroid of rank r, whose independent

sets contain at most one of ai, bi for each i ∈ [r]. That is, we can choose at most one of

the sets Ai, Bi from F for each i ∈ [r]. The globally optimal solution is B = bi : i ∈ [r],corresponding to choosing all of the sets Bi for i ∈ [r]. These sets cover elements of total weight

r − 1 + r − k = 2r − k − 1. On the other hand, we consider the solution A = ai : i ∈ [r],corresponding to choosing all of the sets Ai for i ∈ [r]. These sets cover elements of total weight

only (r − 1)ε+ (r − 1) = (r − 1)(1 + ε).

Consider an arbitrary k-exchange operation applied to A. This operation exchanges ai for

bi for at most k values i ∈ [r]. If we exchange ar for br, then we reduce the total weight of

elements covered by (r − 1)− (r − k) = k − 1. There are k − 1 remaining exchanges, and each

increases the total weight of the covered elements by only 1 − ε, so the total weight covered

must decrease by (k − 1)ε. However, if we do not replace ar with br, each other replacement

can only decrease the value of the solution by ε, so the total weight covered must decrease by

kε. Thus, A is a local optimum for Algorithm k-LocalCoverage, and so the locality ratio for

k-LocalCoverage is at mostf(A)

f(B)=

(r − 1)(1 + ε)

2r − k − 1

Additionally, we consider the standard greedy algorithm SubmodularGreedy applied to this

instance. It begins by choosing the set Ar, since this set covers elements of maximum total


weight. Then for each 1 ≤ i ≤ n− 1, it must choose the set Ai which covers additional weight

ε, instead of the set Bi, which covers no additional weight. Thus, the greedy algorithm must

terminate with the solution A.

The second part of Theorem 3.2 shows that initializing the local search procedure with

the greedy algorithm does not give any increase in performance beyond the locality ratio of

k-LocalCoverage. That is, for any constant k, the approximation ratio of k-LocalCoverage is

onlyr − 1

2r − k − 1=

1

2+O(r−1) .

3.2 A Non-Oblivious Local Search Algorithm

Let us examine in more detail the instance I(2, 1, ε) from the perspective of the algorithm

1-LocalCoverage. The sets F for this instance are given by:

Fa1 = A1 = ε1 Fa2 = A2 = x1

Fb1 = B1 = x1 Fb2 = B2 = y1

In particular, let us consider the bases corresponding to B1, A2 and A1, A2. The set

B1, A2 covers slightly fewer elements than A1, A2, but it is intuitively “better” than

A1, A2 since it is of almost equal value to A1, A2 but additionally covers element x twice.

From our perspective, this is advantageous, since it ensures that x will stay covered regardless

of how the next exchange is chosen.

Following this intuition, we develop a non-oblivious local search algorithm guided by a

potential function that gives preference to “robust” solutions, which cover elements multiple

times, even if these solutions have slightly lower objective value. The same approach appears

in Alimonti [2] and Khanna et al. [64], in the context of the maximum satisfiability problem

and its variants. There, the idea is to give extra weight to clauses which are satisfied by more

than one variable, because these clauses will remain satisfied after flipping the next variable in

the search procedure.

We modify Algorithm 5 by replacing the oblivious objective function f with a potential

function g of the general form:

g(S) =∑u∈U

α#(u,S)w(u) , (3.1)

where

#(x, S) = |Ai : i ∈ S and x ∈ A|

is the number of sets specified by S that contain the element x and we define α0 = 0, so that


elements not included in any set chosen by S are not counted in the sum. Then, by setting,

for example αi > α1 for all i > 1, we can give extra value to solutions that cover some element

more than once. Additionally, note that if we set αi = 1 for all i > 0, we recover the problem’s

given objective function f .

Now we are faced with the problem of how to set the αi. In order gain some intuition, we

return to the instance I(2, 1, ε). We can write the values of f and g on all bases of this instance

in terms of the parameters ε, α1, and α2.

f(a1, a2) = (1 + ε) g(a1, a2) = (1 + ε)α1

f(a1, b2) = (1 + ε) g(a1, b2) = (1 + ε)α1

f(b1, a2) = 1 g(b1, a2) = α2

f(b1, b2) = 2 g(b1, b2) = 2α1 .

Now, we turn to the problem of setting α1 and α2. We shall see that we must balance two

concerns. We want set α2 to be large enough that the algorithm prefers robust solutions in

the short term, allowing it to escape from bad local optima. However, we want to set α2 to be

small enough that the algorithm still makes progress with respect to the objective function f

in the long term.

Suppose that we set α2 to be slightly larger than α1. Then, the solution a1, a2 will remain

a 1-local optimum of g if

ε =α2

α1− 1 .

In this case, the locality ratio will be at most

f(a1, a2)f(b1, b2)

=1 + ε

2=

α2

2α1.

This confirms our intuition that setting α2 to be larger than α1 will improve the locality ratio.

Now suppose that, guided by the observation that larger values of α2 improve the locality

ratio, we set α2 > 2α1. Then, the solution b1, a2 will become a local optimum of g. However,

in this case the locality ratio will be at most

f(b1, a2)f(b1, b2)

=1

2.

This confirms our intuition that it is bad to give too much extra weight to elements covered

multiple times.

This analysis has been tied closely to the specifics of the instance I(1, 2, ε). In Section 3.4 we

give a general procedure that can be used to infer the optimum value of the α coefficients for all

problems of rank r. The procedure generates a family of linear programs (one for each value of

r ≥ 1), that encode the problem of choosing the optimal values of the α sequence on instances of

rank r. Now, however, we simply state the resulting general formula for the α values obtained in


this fashion and show that these values result in an improved (1−1/e)-approximation algorithm.

We define the values

α0 = 0 , α1 = 1− 1

e, αi+1 = (i+ 1)αi − iαi−1 −

1

e. (3.2)

We now derive various properties of the α sequence, which we shall use in our analysis. In our

proofs it will be useful to consider the sequence δ of differences between consecutive members

of the α sequence.

Lemma 3.3. Let δi = αi+1 − αi. Then, the δi satisfy the recurrence relation

δ0 = 1− 1

e, δi = iδi−1 −

1

e. (3.3)

Furthermore, they are in fact equal to

δi = i!

(1− 1

e

i∑k=0

1

k!

). (3.4)

Proof. Clearly δ0 = α1 − α0 = 1− 1e . In general, we have

δi = αi+1 − αi = (i+ 1)αi − iαi−1 −1

e− αi = i(αi − αi−1)− 1

e= iδi−1 −

1

e.

Using this recurrence, we can prove formula (3.4) by induction. It is immediate in the case

i = 0. For i > 0, we have

δi = iδi−1 −1

e= i(i− 1)!

(1− 1

e

i−1∑k=0

1

k!

)− i! 1

i!e

= i!

(1− 1

e

i∑k=0

1

k!

).

By examining the sequence δ, we now show that the α sequence is increasing and concave.

Lemma 3.4. For all i < n, δi > 0 and δi > δi+1.

Proof. The inequality δi > 0 follows directly from (3.4) and the fact that∑i

k=01k! < e for all

i <∞. For the second claim, we first define

e[i] =1

i!i+

i∑`=0

1

`!


and note that

e[i] − e =1

i!i+

i∑`=0

1

`!−∞∑`=0

1

`!=

1

i!i−

∞∑`=i+1

1

`!

=1

i!i− 1

(i+ 1)!

∞∑`=i+1

(i+ 1)!

`!=

1

i!i− 1

(i+ 1)!

∞∑`=0

(i+ 1)!

(`+ i+ 1)!

>1

i!i− 1

(i+ 1)!

∞∑`=0

1

(i+ 1)`=

1

i!i− 1

(i+ 1)!

i+ 1

i= 0 .

Then, for all i ≥ 0, (3.3) and (3.4) give

δi+1 − δi = iδi −1

e= i!i

(1− 1

e

i∑k=0

1

k!

)− 1

e< i!i

(1− 1

e[i]

i∑k=0

1

k!

)− 1

e[i],

where the last inequality follows from e < e[i]. Now, set a = 1i·i! and b =

∑ik=0

1k! . We have

e[i] = a+ b, and the final term of the inequality above is equal to:

1

a− 1

a· b

a+ b− 1

a+ b=a+ b− b− aa(a+ b)

= 0 .

Finally, we show that the α sequence is bounded.

Lemma 3.5. For all i, αi ≤ Hi.

Proof. The recurrence (3.3) and Lemma 3.4 imply that for any k we have:

δk−1 =1

k

(δk +

1

e

)<

1

k

(δ0 +

1

e

)=

1

k.

We obtain the bound on αi by summing this inequality:

αi =i∑

k=1

δk−1 <i∑

k=1

1

k= Hi .

The fact that the α sequence is increasing and concave implies that the non-oblivious po-

tential function g is also monotone submodular.

Theorem 3.6. Let f be a monotone submodular function on F , and let g be defined by (3.1)

and (3.2). Then g is monotone submodular.

Proof. Suppose that T ⊆ S ⊆ [n] and i ∈ [n] \ S. Then, note that for all a ∈ Fi we have


#(a, T ) ≤ #(a, S) since T ⊆ S. Then,

gS(i) = g(S + i)− g(S) =∑a∈Fi

(α#(a,S)+1 − α#(a,S))w(a) =∑a∈Fi

δ#(a,S)w(a)

<∑a∈Fi

δ#(a,T )w(a) =∑a∈Fi

(α#(a,T )+1 − α#(a,T ))w(a) = g(T + i)− g(T ) = gT (i) ,

where the inequality follows from #(a, T ) ≤ #(a, S) and the fact that the δ sequence is de-

creasing (Lemma 3.4).

For monotonicity, we note that

gS(i) =∑a∈Fi

δ#(a,S)w(a) > 0 ,

where the last inequality follows from the fact that the δ sequence is positive (Lemma 3.4).

Using g, we formulate a local search algorithm MatroidCoverage for matroid maximum cov-

erage, shown in Algorithm 6. The algorithm is parameterized by an approximation parameter

ε > 0, which governs how much of an improvement is required to improve the current solution

to be accepted. We describe this aspect of the algorithm in more detail in the next section.

MatroidCoverage first computes a greedy solution to the non-oblivious potential function g. It

then uses a simple approximate local search procedure guided by g that exchanges 1 element

in the current solution for one element not in the current solution.

Algorithm 6: MatroidCoverage

Input: Approximation parameter ε > 0. Coverage function f =(U,w,F), where |F| = nMatroid M = ([n], I), as an independence oracle for I

Let g be the potential function given by (3.1) and (3.2);S ← the result of running SubmodularGreedy on g, M;

Set r = |S| and ε0 = εrHr

(1− 1

e − ε)−1

;

repeatforeach a ∈ [n] \ S do

foreach b ∈ S doif S − b+ a ∈ I and g(S − b+ a) > (1 + ε0)g(S) then

S ← S − b+ a;break;



3.3 Analysis of the Algorithm

We now analyze the approximation performance and runtime of MatroidCoverage. We examine

an arbitrary instance f = (U,F , w), M = ([n], I) of matroid maximum coverage. The solution

S returned by MatroidCoverage for this instance is an ε0-approximate local optimum of g. We

consider some global optimum O of f . Because f is monotone, we can assume without loss of

generality that O is a base of M. As shown in Theorem 3.6, our potential function g is also

monotone. Thus, we can also assume that the initial greedy solution computed by Matroid-

Coverage is a base of M. Furthermore, each improvement step in MatroidCoverage maintains

the cardinality of the solution. Thus, the final solution S produced by the MatroidCoverage

must also be a base of M.

From Theorem 2.13, there must exist a bijection π : O → S such that S − π(i) + i is a base

of M for all i ∈ O. Let O = o1, . . . , or and then number the elements of S as s1, . . . , srso that si = π(oi) for all i ∈ [r]. Then, we consider only the exchange operations that remove

π(oi) = si from S and add oi to the result. Each such operation is considered by the algorithm

at each iteration, and each one of them results in an independent set in M. Thus, ε0-local

optimality implies that we have

(1 + ε0)g(S) ≥ g(S − si + oi)

for each i ∈ [r]. Summing over all r such inequalities we obtain the inequality

r(1 + ε0)g(S) ≥r∑i=1

g(S − si + oi). (3.5)

The main difficulty of the analysis lies in the fact that inequality (3.5) is given in terms of

the non-oblivious potential function g, but we wish to derive an approximation guarantee for

the original objective function f . In order to put g and f on common ground, we introduce the

following symmetric notation.

For any two subsets A,B ⊂ [r], we define EA,B to be the set of elements that belong to all

of the sets Fsi for i ∈ A and Foj for j ∈ B and no other sets in F . The sets EA,B thus form a

partition of U . Then, for all integers a, x, b ≥ 0 such that 1 ≤ a+ x+ b ≤ r, we define

wa,x,b =∑

A,B⊆[r]|A\B|=a,|B\A|=b,|A∩B|=x

w(EA,B).

In words, wa,x,b is the total weight of elements that belong to exactly a + x of the sets Fsispecified by S and exactly b+ x of the sets Foj specified by O, where exactly x of the sets have

the same index in both S and O (that is, x of the sets have i = j). We call the variables wa,x,b

symmetric variables.


We can express all the quantities we are interested in using the symmetric variables wa,x,b:

f (S) =∑

a,x,b≥0a+x≥1

wa,x,b (3.6)

f (O) =∑

a,x,b≥0b+x≥1

wa,x,b (3.7)

g(S) =∑

a,x,b≥0a+x≥1

αa+xwa,x,b =∑

a,x,b≥0

αa+xwa,x,b (since α0 = 0) (3.8)

r∑i=1

g(S − si + oi) =∑

a,x,b≥0

(aαa+x−1 + bαa+x+1 + (r − a− b)αa+x)wa,x,b . (3.9)

The only expression which is not immediate is the one for∑r

i=1 g(S − si + oi). We derive it as

follows: consider an arbitrary element u and let A,B ⊆ [r] be the unique sets of indices such

that u ∈ EA,B. Then, u’s weight is counted in (only) the quantity wa,x,b where x = |A ∩ B|,a = |A \ B|, and b = |B \ A|. In S, the element u appears in |A| = a+ x sets. If i ∈ A \ B, it

appears in |A|− 1 = a+x− 1 sets in (S− si + oi). If i ∈ B \A, it appears in |A|+ 1 = a+x+ 1

sets in (S − si + oi). Finally, if i ∈ A ∩ B or i /∈ A ∪ B, it also appears in |A| = a + x sets in

(S − si + oi). For each such element, there are precisely a values leading to exchanges of the

first type, b values leading to exchanges of the second type, and r− a− b values of i leading to

exchanges of the third type.

We are now ready to state the main technical result of this section, which bounds the locality

ratio of MatroidCoverage.

Lemma 3.7. If S is an ε0-approximate local optimum for function g, then:

(1 + ε0rHr)f(S) ≥ (1− 1/e)f(O) .

Proof. Because S is an ε0-approximate local optimum, (3.5) holds. We use (3.8) and (3.9) to

reformulate inequality (3.5) in terms of our symmetric notation, obtaining

0 ≤∑

a,x,b≥0

((a+ b+ ε0r)αa+x − aαa+x−1 − bαa+b+1)wa,x,b. (3.10)

Similarly, we use (3.6) and (3.7) to reformulate our goal in terms of our symmetric notation,

obtaining

0 ≤ (1 + ε0rHr)∑

a,x,b≥0a+x≥1

wa,x,b −(

1− 1

e

) ∑a,x,b≥0a+b≥1

wa,x,b. (3.11)

To prove the lemma, we show that (3.10) implies (3.11). Since we have wa,x,b ≥ 0 for all a, x, b,

it suffices to show that the coefficient of any term wa,x,b in (3.10) is at most its coefficient

in (3.11). We consider three cases, simplifying expressions throughout by using the fact that


α0 = 0.

In the first case, suppose that b = x = 0. We must show that for all 1 ≤ a ≤ r,

(a+ ε0r)αa − aαa−1 ≤ 1 + ε0rHr.

We have

(a+ ε0r)αa − aαa−1 = aδa−1 + ε0rαa by definition of δ

= δa +1

e+ ε0rαa by recurrence (3.3)

≤ δ0 +1

e+ ε0rαa by Lemma 3.4

= 1 + ε0rαa by definition of δ

≤ 1 + ε0rHr by Lemma 3.5 .

In the next case, suppose that a = x = 0. We must show that for all 1 ≤ b ≤ r,

−bα1 ≤ −(

1− 1

e

).

This follows directly from the fact that b ≥ 1 and α1 = 1− 1e .

Finally, we must show for a, x, b such that a+ x 6= 0, b+ x 6= 0, and 1 ≤ a+ x+ b ≤ r,

(a+ b+ ε0r)αa+x − aαa+x−1 − bαa+x+1 ≤1

e+ ε0rHr . (3.12)

For this inequality, we consider two subcases. In the first, suppose b = 0 and so x ≥ 1. Then,

we have

(a+ ε0r)αa+x − aαa+x−1 = aδa + ε0rαa+x by definition of δ

= δa+1 +1

e− δa + ε0rαa+x by recurrence (3.3)

≤ 1

e+ ε0rαa+x by Lemma 3.4

≤ 1

e+ ε0rHr by Lemma 3.5 .


In the second subcase, we have b ≥ 1. Then

(a+ b+ ε0r)αa+x − aαa+x−1 − bαa+x+1

= aδa+x−1 − bδa+x + ε0rαa+x by definition of δ

≤ aδa+x−1 − δa+x + ε0rαa+x because b ≥ 1

=1

e− xδa+x + ε0rαa+x by recurrence (3.3)

≤ 1

e+ ε0rHr by Lemma 3.5 and δa+x ≥ 0 .

This completes the proof of the Lemma.

Now, we consider the run-time of MatroidCoverage. By keeping track of how many times

each element of U is covered by the current solution, we can compute the change in g due to

adding or removing a set from the solution using only time O(u); we simply increment the count

of each element in the added set and decrement the count of each element in the removed set,

and also compute the change in g due to each such increment or decrement. The initial greedy

phase takes time O(rnu). Each iteration of the local search phase examines O(rn) potential

exchanges, each of which requires a single call to the independence oracle for M and a single

evaluation of the change in g. Thus, each iteration can be completed in time O(rnu). We now

bound the total number of iterations that the algorithm can make.

Lemma 3.8. Algorithm MatroidCoverage makes at most O(ε−10 ) improvements.

Proof. Consider the solution Sinit produced by the initial greedy phase of the algorithm Matroid-

Coverage, and let Og = arg maxS∈I g(S). Then, MatroidCoverage makes at most log1+ε0g(Og)g(Sinit)

improvements. As shown in Theorem 3.6, g is monotone submodular. The classical result of Fis-

cher, Nemhauser, and Wolsey [43] implies that the greedy algorithm is then a 1/2-approximation

for maximizing g, and so g(Sinit) ≥ 12g(Og). Algorithm MatroidCoverage can therefore make at

most

log1+ε0

g(Og)

g(Sinit)≤ log1+ε0 2 =

log 2

log(1 + ε0)= O(ε−1

0 )

improvements.

We are now ready to state our main result.

Theorem 3.9. For all ε > 0, MatroidCoverage is a (1 − 1e − ε)-approximation algorithm for

matroid maximum coverage, running in time O(ε−1r2nu log r)

Proof. The theorem follows directly from Lemmas 3.7 and 3.8 after expanding the definition

ε0 =ε

rHr

(1− 1

e − ε) = O

(ε

r log r

).


Finally, by using the partial enumeration technique described in Section 2.6 we can remove

the small loss of ε from the approximation ratio of the MatroidCoverage, obtaining a clean

(1− 1e )-approximation.

Theorem 3.10. Suppose that we set ε = 13r in MatroidCoverage. Then, PartEnum(Matroid-

Coverage) is a (1− 1e )-approximation algorithm running in time O(r3n2u log r)

Proof. Theorem 3.9 shows that has an approximation ratio of 1 − 1e −

13r when ε is set to 1

3r .

Theorems 2.22 and 2.19 imply that the approximation ratio of PartEnum(MatroidCoverage) is

hence

1

r+

(1− 1

r

)(1− 1

e− 1

3r

)=

1

r+ 1− 1

e− 1

3r− 1

r+

1

er+

1

3r2

= 1− 1

e+

1

er− 1

3r+

1

3r2

> 1− 1

e,

where in the last line we have used the fact that e ≤ 3. The algorithm PartEnum(Matroid-

Coverage) makes n calls to MatroidCoverage. From Theorem 3.9, each of these takes time

O(r3nu log n) so the total runtime is O(r3n2u log r).

3.4 Obtaining the α Sequence

In Section 3.2 we gave an equation for the sequence α = α0, α1, . . . of coefficients for defining a

non-oblivious potential function and showed that the resulting function g yields an algorithm

with an improved approximation ratio. Here we describe the method by which the α sequence

was obtained. Our approach uses an idea similar to the factor revealing linear program applied

by Jain et al. [57] for the analysis of greedy algorithms for a facility location problem. Specifi-

cally, we formulate a linear program that determines, for an instance of rank r and a function of

the general form (3.1), the best attainable locality ratio, the value of the coefficients α1, . . . , αr

attaining this ratio (as usual, we shall assume α0 = 0), and a tight instance for the resulting

algorithm. We now show how to derive the program for some fixed value of r.

For any sequence α = α1, . . . , αr we denote by gα,f the function g obtained from f and α

by using (3.1). We consider the problem of choosing values for α1, . . . , αr, that maximize the

locality ratio over all instances of rank r. For now, we restrict ourselves to the following simpler

problem. Suppose that we are given a sequence α = α1, . . . , αr and want to find the worst-case

locality ratio2 of the algorithm MatroidCoverage over all instancesM, f with rank r, when gα,f

is used as the non-oblivious potential function in the algorithm.

Let S = s1, . . . , sr and O = o1, . . . , or be two sets of r indices. We want to find a

coverage function f on the ground set S∪O that minimizes f(S)/f(O), subject to the constraint

2Here, for ease of presentation, we consider exact local optima in our program, setting ε to 0 in Algorithm 6.Our approach generalizes easily to the case in which ε > 0.


that S is a local optimum for the function gα,f . The resulting problem can be formulated as

minF=Fii∈O∪Sf=(U,w,F)

f(S)

f(O)

s.t. gα,f (S)− gα,f (S − si + oi) ≥ 0 1 ≤ i ≤ r

(3.13)

The r constraints in (3.13) enforce the requirement that S is a local optimum with respect to

gα,f . Note that the constraints only consider exchanges of the form S− si + oi. As we noted in

Section 3.3, Theorem 2.13 implies that we can always index the elements of S and O so that

S − si + oi is independent for all 1 ≤ i ≤ r. Thus, there is always some such set of exchanges.

Moreover, there are instances (e.g. those instances considered in 3.2) in which the r exchanges

of the form guaranteed by Theorem 2.13 are the only ones that lead to a valid solution, and we

are interested in the worst-case locality ratio over all possible instances. Thus, it is sufficient

to consider only exchanges of this form.

We need not worry about size of S ∩ O in our program. That is, in formulating (3.13)

and our following programs, we can assume that S and O are disjoint, but in our analysis and

related discussion, we allow for S∩O 6= ∅. This is because considering the general case in which

there is some element i ∈ S ∩ O, is equivalent to considering a coverage function containing 2

copies of Fi: one with an index in si ∈ S and one with an index oi ∈ O. Hence it is sufficient

to minimize over all coverage functions in our programs, as this will implicitly minimize over

all configurations of S and O, as well.

Program (3.13) has a non-linear objective. However, we note that the ratio f(S)/f(O) and

local optimality with respect to gα,f (regardless of the specific sequence α) are both unchanged

if we scale all of the weights w(u) by a constant. Thus, we may assume without loss of generality

that f(O) = 1, obtaining the following program

minF=Fii∈O∪Sf=(U,w,F)

f(S)

s.t. gα,f (S)− gα,f (S − si + oi) ≥ 0 1 ≤ i ≤ r

f(O) = 1

(3.14)

We are now faced with question of how to implement the minimization over an arbitrary

coverage function f = (U,w, S ∪O) in (3.14). More concretely, we seek a succinct parameteri-

zation of that yields all possible such coverage functions. Here, we make use of the symmetric

variables wa,x,b introduced in Section 3.3. Equations (3.6) and (3.7) show that for any coverage

function f , we can express f(S) and f(O) in terms of wa,x,b. However, it remains unclear how to

represent the first set of constraints in (3.14). Equation (3.10) shows how to use the symmetric


variables to represent the weaker statement given in (3.5), here equivalent to∑1≤i≤r

[gα,f (S)− gα,f (S − si + oi)] ≥ 0 . (3.15)

If we replace the r constraints in (3.14) by the single constraint (3.15), and use (3.6) and (3.7)

to represent f(S) and f(O) we obtain the concrete linear program:

minwa,x,b

∑a+x≥1

wa,x,b

s.t.∑a,x,b

((a+ b)αa+x − aαa+x−1 − bαa+x+1)wa,x,b ≥ 0

∑b+x≥1

wa,x,b = 1

wa,x,b ≥ 0

(3.16)

The program (3.16) has a variable corresponding to wa,x,b for each triple of non-negative values

a, x, b satisfying a + x + b ≤ r. If the variables wa,x,b are the symmetric variables for some

underlying coverage function f , then they must satisfy all of the constraints of (3.16), and

moreover the resulting value of (3.16) corresponds to f(S). Thus, for every feasible solution

of (3.14) there is a corresponding feasible solution to (3.16) of equal value. However, it is not

clear that the converse must hold. Since the constraint (3.15) is strictly weaker than the r

constraints in (3.14), the program (3.16) could have solutions of smaller value than (3.14). We

now show that this is not the case, and that the converse must, in fact, hold.

Theorem 3.11. Let wa,x,b, where a, x, b ≥ 0, and a+ x+ b ≤ r be a feasible solution of (3.16)

with value v. Then, there exists a coverage function f = (U,w,F) on S ∪ O that is a feasible

solution of (3.14) with value v.

Proof. Let S = s(1), . . . , s(r) and O = o(1), . . . , o(r) be two sets of indices.3 We define

f = (U,w,F) on S ∪ O as follows. For each a, x, b ≥ 0 satisfying a + x + b ≤ r, and each

i ∈ [r], we create an element u(a, x, b, i) ∈ U , and define w(u(a, x, b, i)) = wa,x,b/r for all i. The

collection F contains r sets Fs(i) and r sets Fo(i), constructed as follows:

Fs(i) =

r⋃j=1

u(a, x, b, j) : a+ x ≥ i+ j mod r

Fo(i) =

r⋃j=1

u(a, x, b, j) : b+ x ≥ i+ j mod r .

3We adopt the notation s(i) and o(i) instead of si and oi here to avoid an unsightly proliferation of subscriptsin the proof.


Then,

f(S) = w

r⋃i=1

r⋃j=1

u(a, x, b, j) : a+ x ≥ i+ j mod r

= w

⋃i=j

u(a, x, b, j) : a+ x ≥ 0

=

1

r

r∑i=j

∑a,x,b≥0a+x≥0

wa,x,b(a, x, b, j)

=∑

a,x,b≥0a+x≥0

wa,x,b = v ,

and similarly

f(O) = w

r⋃i=1

r⋃j=1

u(a, x, b, j) : b+ x ≥ i+ j mod r

= w

⋃i=j

u(a, x, b, j) : b+ x ≥ 0

=

1

r

r∑i=j

∑a,x,b≥0b+x≥0

wa,x,b(a, x, b, j)

=∑

a,x,b≥0b+x≥0

wa,x,b = 1 ,

where the last equality follows from the fact that the wa,x,b are a feasible solution for (3.16).

Finally, we show that S is a local optimum with respect to the non-oblivious function gα,f .

Specifically, we show that gα,f (S)−gα,f (S−s(i)+o(i)) ≥ 0 for each i ∈ [r]. We fix an arbitrary

i ∈ [r], and then consider how many times each element is covered by S vs. how many times it

is covered in S−s(i)+o(i). This will allow us to compute the value gα,f (S)−gα,f (S−s(i)+o(i))

Specifically, for each a, x, b ≥ 0 and j ∈ [r], we consider how many times the elements in the

set u(a, x, b, j) are covered, and then compute how many total such elements there are.

Fix a, x, b ≥ 0 and j ∈ [r] and consider the r elements in u(a, x, b, j). In S each of the r

elements u(a, x, b, j) are covered a + x times, once for each of the a + x values of i ∈ [r] such

that i+ j mod r ≤ a+ x. On the other hand, each element of u(a, x, b, j) that appears in Fs(i)

but not in Fo(i) is covered once less in S − s(i) + o(i) than it is in S. Similarly, each element of

u(a, x, b, j) that appears in Fo(i) but not in Fs(i) is covered once more in S − s(i) + o(i) than it

is in S.

We now show how many such elements of the latter two kinds there are. The set Fs(i)


contains the point u(a, x, b, j) for each of the a+x values of j ∈ [r] such that i+j mod r ≤ a+x).

Similarly, the set Fo(i) contains the point u(a, x, b, j) for each of the b+ x values of j ∈ [r] such

that i+ j mod r ≤ b+ x. Finally, the point u(a, x, b, j) appears in both Fs(i) and Fo(i) for each

of the x values of j ∈ [r] such that i+ j mod r ≤ x.

Thus, for each a, x, b ≥ 0 there are a elements covered once less (or a + x − 1 times total)

in S− s(i) + o(i) and b elements covered once more (or a+ x+ 1 times total) in S− s(i) + o(i).

All other elements are covered the same number of times in both S and S − s(i) + o(i). So,

g(S)− g(S − s(i) + o(i))

=∑

a,x,b≥0

rαa+xwa,x,br−∑

a,x,b≥0

(aαa+x−1 + bαa+x+1 + (r − a− b)αa+x)wa,x,br

=1

r

∑a,x,b≥0

rαa+xwa,x,b −∑

a,x,b≥0

(aαa+x−1 + bαa+x+1 + (r − a− b)αa+x)wa,x,b

≥ 0 ,

where the last inequality follows from the fact that the wa,x,b are a feasible solution for (3.16).

Thus, we have f a feasible solution for (3.14), with value f(S) = v.

The value of the optimal solution to (3.16) is therefore the same as the value of the optimal

solution to (4.18). This value corresponds to the locality ratio of MatroidCoverage when g is

given by gα,f . Furthermore, the solution attaining this value gives a tight (symmetric) instance

for the objective function gα,f . Now we consider the problem of obtaining the optimal values

α1, . . . , αr. We want to maximize the value of (3.16) over all choices of α1, . . . , αr. Although

this results in a max-min problem, linear programming duality implies that the value of the

optimal solution to (3.16) is the same as the value of the optimal solution to the dual:

maxθ,ρ

θ

s.t. 1(b+x≥1) + ρ ((a+ b)αa+x − aαa+x−1 − bαa+x+1) ≤ 1(a+x≥1)

a, x, b ≥ 0, a+ x+ b ≤ r

ρ ≥ 0

(3.17)

The dual has only 2 variables ρ and θ. The variable ρ appears only in terms of the form ραi

for 1 ≤ i ≤ r. Hence, fixing an arbitrary a value of ρ corresponds to multiplying each of the αi

by some constant. The local optimality conditions from the primal (3.16) are invariant under

this operation, and so we can set ρ = 1 without affecting the value of the solutions of (3.17).


Doing so, we obtain the final linear program:

maxα1,...,αr,θ

θ

s.t. θ · 1(b+x≥1) + ((a+ b)αa+x − aαa+x−1 − bαax+1) ≤ 1(a+x≥1)

a, x, b ≥ 0, a+ x+ b ≤ r

(3.18)

A solution to (3.18) gives us the optimal setting for α1, . . . , αr as well as the approxima-

tion ratio θ attained by this setting. By experimentally examining the solutions for programs

corresponding increasing values of r, we deduced the general pattern:

θ = 1− 1

e[r], α1 = θ, αi+1 = (i+ 1)αi − iαi−1 +

1

e[r]

As r →∞ the value 1e[r]

approaches 1e and these equations give the equations stated in (3.2). In

fact, if g is defined using the value 1e[r]

in place of 1e (as above), then the analysis of Section 3.3

can be modified to show that the resulting algorithm attains an approximation ratio of 1− 1e[r]

on instances of rank r. However, this requires that the coefficients used to define g depend on

the rank of the given instance.

Chapter 4

Monotone Submodular

Maximization

In this chapter, we extend the algorithmic techniques of Chapter 3 beyond the context of

coverage functions to the general problem of maximizing any monotone submodular function

subject to a single matroid constraint. We present an optimal (1−1/e)-approximation algorithm

for monotone submodular optimization over a matroid constraint. Unlike the continuous greedy

algorithm described in Section 2.5, our algorithm is combinatorial, in the sense that it considers

only integral solutions and hence does not require any rounding procedures. Furthermore, our

algorithm is extremely simple to state—it simply runs the standard greedy algorithm followed

by 1-local search. Both phases are run not on the actual objective function, but on a related

non-oblivious potential function, which is also monotone submodular.

Our approach generalizes to the case where the monotone submodular function has restricted

total curvature c. For any c ∈ (0, 1], we give an algorithm to produce a 1−e−cc -approximation.

This result matches that of Vondrak [95], who showed that the continuous greedy algorithm

produces a 1−e−cc -approximation when the objective function has total curvature c. Vondrak

also showed that this result is tight when f is given by a value oracle, in the sense that no better

approximation can be attained with a polynomial number of queries. Unlike the continuous

greedy algorithm, our algorithm requires knowledge of the value of c. By enumerating over

a sufficiently large set of values, we can obtain a(

1−e−cc − ε

)-approximation even when c is

unknown.

Compared to the continuous greedy algorithm, our algorithm’s runtime has improved de-

pendence on the size n of the ground set, at the expense of increased dependence on the rank

r of the given matroid. In the general case in which f has unrestricted curvature, we ob-

tain a(1− 1

e − ε)-approximation in time O(ε−3r4n) and a clean

(1− 1

e

)-approximation in time

O(r7n2), where n is the size of the ground set and r is the rank of the given matroid. Cali-

nescu et al. state that the continuous greedy algorithm runs in time O(n8) but that this can

be improved by careful analysis. A more detailed analysis of the continuous greedy algorithm,

43

Chapter 4. Monotone Submodular Maximization 44

including a variant in which swap rounding is used instead of pipage rounding, appears in [41].

There it is shown that the continuous greedy algorithm using swap rounding can be imple-

mented in time O(n6), although it may be possible to employ swap rounding at each stage of

the continuous algorithm to further decrease the dependence on n.

4.1 A Non-Oblivious Local Search Algorithm

In Chapter 3 we developed a non-oblivious local search for the special case in which the mono-

tone submodular function f was given as a coverage function. There, our potential function

gave extra weight to elements covered multiple times. The main challenge in extending that

approach to the general monotone submodular case is that we must find a way to define the

non-oblivious potential function without reference to a coverage representation. Specifically, we

must construct a variant of the potential function that doesn’t refer to elements.

Our purpose for giving extra weight to elements covered several times was to cause the local

search algorithm to prefer solutions that were “robust,” in the sense that they would retain high

objective values even if some sets were removed in future iterations. Here, we accomplish this

goal in a more direct fashion. We consider the average weight of all solutions obtained from the

current solution by deleting 1, 2, . . . elements. Our potential function aggregates the information

obtained by applying the objective function to all subsets of the input, weighted according to

their size. Intuitively, the resulting potential function gives extra weight to solutions that

contain a large number of good sub-solutions, or equivalently, remain good solutions on average

even when elements are randomly removed. Here, the weight that we give to the average

value of all solutions obtained from S by deleting i elements corresponds roughly to the extra

weight given to elements covered i+ 1 times in the coverage case, in the sense that both govern

how much value we place on the robustness of the solution to the removal of i elements. We

formalize this intuition further in Section 4.3, where we show that the definition of the non-

oblivious potential function from this section coincides with that from Section 4.4 when f is a

coverage function.

With this approach in mind, we define g(S) to be a combination of the values f(T ) for all

T ⊆ S. The extra weight that a subset T contributes will depend on both its size and the size

of the solution S on which we are evaluating g. Our function g has the general form

g(S) =

|S|∑k=0

∑T∈(Sk)

β(|S|)k(|S|k

) f(T ) =

|S|∑k=1

β(|S|)k E

T∈(Sk)[f(T )] . (4.1)

Here,β(|S|)k

(|S|k )≥ 0 is the weight given to the value f(T ) on any particular subset T of size k,

so β(|S|)k is the weight given to the average value of f(T ) over all subsets T of size k.

Alternatively, we can think of g in terms of the following random process. First, we choose

a value k between 1 and |S| with probability proportional to β(|S|)k . Then, we choose k items


randomly from S to obtain a subset T . The value of g is then precisely the expected value of

f on the resulting random subset T , scaled by a multiplicative constant.

Note that the coefficients β depend on the size of |S|—that is, we have a separate sequence

of coefficients β(m)0 , . . . , β

(m)m for each value of m (where m corresponds to the size of the set S

on which g is being evaluated). This turns out to be necessary in order to obtain a function g

that is monotone submodular, as well as to relate g to the potential function defined in Section

3.2, which we do in Section 4.3.

Finally, we note that evaluating g(S) requires evaluating f(S) on all subsets of S. Thus,

exactly evaluating g requires an exponential number of calls to the value oracle for f . We shall

show in Section 4.2.3 that g(S) can in fact be estimated efficiently using random sampling.

In order to complete our definition of the non-oblivious local search algorithm, we must

“only” specify appropriate values for these coefficients γ(m)` . We now turn to this task.

First, we note that because f(∅) = 0, the value of g does not depend on the coefficients

β(m)0 . We adopt the convention that β

(m)0 = 0 for all m. For ` > 0 we define βm` in terms of

the auxiliary values γ(m)` defined as:

γ(m)` = `β

(m)` , for ` > 0 .

These coefficients γ(m)` , and hence also the function g, depend on the value of the parameter

c giving the total curvature of f . Thus, we obtain a different potential function gc for each

value of c. To avoid further cluttering our notation, we will assume that c has been specified

at this point, and leave the dependence implicit. For technical reasons we shall assume that

c 6= 0, and so c ∈ (0, 1]. Note that the case c = 0 corresponds to the linear case, for which the

greedy algorithm is already optimal.

We now define the values γ(m)` for 1 ≤ ` ≤ m. Our definition is recursive and makes use

of two auxiliary terms γ(m)0 and γ

(m)m+1 used only in the analysis. Specifically, we define the γ

terms by the initial conditions

γ(m)0 = 1, γ

(m)m+1 = ec for all m ≥ 0 (γ-base)

and the upward recurrence

γ(m)` = c−1m

(γ

(m−1)` − γ(m−1)

`−1

)for m ≥ 1, and 1 ≤ ` ≤ m. (γ-up)

Note that for all m we have γ(m)0 = 1 but β

(m)0 = 0, while for all m and 0 < ` < m, we use the

definition γ(m)` = `β

(m)` , which specifies β

(m)` .

In Section 4.3 we give an account of how these coefficients were obtained and expand on

their relation to the potential function for the coverage case. For now we shall simply state our

non-oblivious local search algorithm, shown in Algorithm 7, and in the following section prove

that it is a (1 − 1/e − ε)-approximation. It is very similar to the algorithm MatroidCoverage


given in Algorithm 6 in Section 3.2, except it uses the new potential function g that we have

just defined. In addition to the approximation parameter ε, the algorithm is parameterized

by c ∈ (0, 1], which is an upper bound on total curvature of function f . Another difference

between the algorithms is that the initial greedy phase of MatroidSubmodular is performed on

the objective function f , while MatroidCoverage uses the non-oblivious potential function g.

This change will simplify the analysis of the algorithm, increasing its runtime by a factor of

only logHr = O(log log r). In Section 4.4, we briefly discuss how this extra cost can be removed,

by sketching the analysis in the case that the initial greedy phase is executed on g.

Algorithm 7: MatroidSubmodular

Input: Approximation parameter ε > 0Curvature bound c ∈ (0, 1]Monotone submodular function f with total curvature at mostc, as a value oracleMatroid M = (X, I), as an independence oracle for I

Let g be the potential function given by (4.1), (γ-base), and (γ-up) for the given valueof c.;Let Sinit be the result of running SubmodularGreedy on f , M;

Set r = |Sinit| and ε1 = ε7rHr

(1−e−cc − ε

)−1;

S ← Sinit;repeat

foreach a ⊆ X \ S doforeach b ∈ S do

if S − b+ a ∈ I and g(S − b+ a) > (1 + ε1)g(S) thenS ← S − b+ a;break;


4.2 Analysis of the Algorithm

We now analyze the runtime and approximation performance of MatroidSubmodular. While

the algorithm is simple to state, its analysis is quite involved. We give the necessary technical

proofs in the following subsections, and provide a high-level overview here.

As in the case of maximum coverage, we examine an arbitrary instance consisting of a

monotone submodular function f with total curvature at most c and a matroid M = (X, I).

Let ε > 0, and suppose that S is an ε-approximate solution and O is a globally optimal solution

for this instance. Because f is monotone, we can assume without loss of generality that O

is a base of M. We can further assume that the initial greedy solution of f computed by

MatroidSubmodular is a base ofM. Furthermore, each improvement step in MatroidSubmodular


maintains the cardinality of the solution. Thus, the final solution S produced by the Matroid-

Submodular must also be a base of M. From Theorem 2.13, there exists a bijection π : O → S

such that S − π(i) + i is a base of M for all i ∈ O. Let O = o1, . . . , or and then number

the elements of S as s1, . . . , sr so that si = π(oi) for all i. Furthermore, for any element

x ∈ S∩O, we have x = si = oi for some i. Then, we consider only the exchange operations that

remove π(oi) = si from S and add oi to the result. Each such operation is considered by the

algorithm at each iteration, and each one of them results in an independent set in M. Thus,

ε-local optimality implies that we have

(1 + ε)g(S) ≥ g(S − si + oi)

for each i ∈ [r]. Summing over all r such inequalities we obtain the inequality

r(1 + ε)g(S) ≥r∑i=1

g(S − si + oi). (4.2)

Now, we must relate (4.2), which is stated in terms of g, to the objective function f . In

order to do this, we introduce a symmetric notation, similar to that utilized in Chapter 3. For

a set of indices I ⊆ [r], we denote by SI the set si ∈ S : i ∈ I of all elements of S with

indices in I and similarly, we define OI = oi ∈ O : i ∈ I. Let a, x, b be non-negative integers

satisfying a+ x+ b ≤ r. Then, we define Xa,x,b to be a collection of sets containing1 (SA ]OB)

for all distinct (A,B), satisfying |A| = a + x, |B| = b + x, |A ∩ B| = x. That is, Xa,x,b is the

collection of all sets containing a+ x elements from S, and b+ x elements from O, where x of

the elements have the same index in both S and O. Then,

|Xa,x,b| =(r

a

)(r − ab

)(r − a− b

x

).

We define Fa,x,b to be the expected value of a uniformly random set in Xa,x,b:

Fa,x,b =1

|Xa,x,b|∑

T∈Xa,x,b

f(T ).

To simplify expressions, we adopt the convention that Fa,x,b = 0 if one of a, x, b is negative.

We now can express all of the quantities that we are interested in using the symmetric

1Here, we use SA ] OB to denote the disjoint, or tagged, union of SA and OB . That is, for any elementsi = oi ∈ S ∩ O, the disjoint union has distinct sets in it corresponding to the case in which the element waschosen as an element of S or as an element of O. This greatly simplifies our analysis, as it allows us to effectivelyassume that S and O are disjoint.


variables Fa,x,b

f(S) = Fr,0,0 (4.3)

f(O) = F0,0,r (4.4)

g(S) =

r∑k=0

β(r)k Fk,0,0 (4.5)

r∑i=1

g(S − si + oi) =r∑

k=0

[(r − k)β

(r)k Fk,0,0 + kβ

(r)k Fk−1,0,1

]. (4.6)

Equations (4.3), (4.4), and (4.5) follow immediately from the definition of the symmetric vari-

ables. We now show that (4.6) is valid. We shall show that for each set A ⊆ X, f(A) is appears

with the same total weight on the right and left side of (4.6). Expanding the definition of g on

the left, we obtainr∑i=1

r∑k=0

β(r)k(rk

) ∑T∈(S−si+oik )

f(T )

We note that the only sets that appear in (4.6) with non-zero weight are of the form SI or

SJ + oi, where I, J ⊆ [r]. We consider each of these types of sets in turn.

First, consider an arbitrary set SI , where |I| = k for some 1 ≤ k ≤ r. Then, SI is in the

collection Xk,0,0 and not in any other collection in (4.6). Hence, f(SI) appears on the right

hand side of (4.6) with total weight

(r − k)β

(r)k

|Xk,0,0|= (r − k)

β(r)k(rk

) .

For the left hand side, we note that SI appears as a subset T of S− si + oi for each value i 6∈ I.

Thus, it appears in the left hand side of (4.6) with total weight (r − k)β(r)k

(rk).

Next, consider an arbitrary set SJ + o` where |J | = k − 1 for some 1 ≤ k ≤ r and ` 6∈ J .

Then, SJ + o` appears in the collection Xk−1,0,1 and not in any other collection in (4.6). It

appears on the right hand side of (4.6) with total weight

kβ

(r)k

|Xk−1,0,1|= k

β(r)k(

rk−1

)(r−k+1

1

) =β

(r)k

r−k+1k

(r

k−1

) =β

(r)k(rk

) .

For the left hand side, we note that SJ + o` appears as a subset of S − si + oi, only for the

single value i = `. Thus, it appears on the left hand side of (4.6) with total weightβ(r)k

(rk).

Reformulating (4.2), which follows from the ε-approximate local optimality of g, in terms

of our symmetric notation, we obtain

(rε)g(S) + rr∑

k=0

β(r)k Fk,0,0 ≥

r∑k=0

[(r − k)β

(r)k Fk,0,0 + kβ

(r)k Fk−1,0,1

].


We rearrange this to obtain

(rε)g(S) +r∑

k=0

kβ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0 . (4.7)

For k > 1, we have kβ(r)k = γ

(r)k , and in the case k = 0 we have both F0,0,0 = 0 and F−1,0,1 = 0,

so β(r)0 (F0,0,0 − F−1,0,1) = γ

(r)0 (F0,0,0 − F0−1,0,1). Thus, (4.7) is equivalent to

(rε)g(S) +

r∑k=0

γ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0 . (4.8)

We now turn to the task of translating (4.8) into a statement about the locality ratio of

MatroidSubmodular. Our proof will use the submodularity of f only in the form of the following

general inequality, stated in our symmetric notation.

Lemma 4.1. For ` satisfying 0 ≤ ` ≤ r,

(r − `)(F`,0,1 − F`,0,0) + `(F`−1,0,1 − F`−1,0,0) + cF`,0,0 ≥ f(O).

Proof. In order to prove Lemma 4.1, we first prove three smaller inequalities. Consider a set

SL, where |L| = `. From Theorem 2.6, we have∑i∈[r]\L

[f(SL + oi)− f(SL)] ≥ f(SL ∪O[r]\L)− f(SL) . (4.9)

Now, suppose that M = i ∈ L : si = oi. Then, we have

∑i∈M

[f(SL − si + oi)− f(SL − si)] =∑i∈M

[f(SL)− f(SL − si)]

≥∑i∈M

(1− c)f(si) ≥ (1− c)f(SM ) , (4.10)

where the first inequality follows from the fact that f has total curvature at most c, and the

second from Theorem 2.6.

Furthermore,

∑i∈L\M

[f(SL − si + oi)− f(SL − si)] ≥∑

i∈L\M

[f(SL + oi)− f(SL)]

≥ f(SL ∪OL\M )− f(SL) = f(SL ∪OL)− f(SL) , (4.11)

where the first inequality follows from the decreasing marginals characterization of submodu-

larity, the second from Theorem 2.6, and the final equality from si = oi for all i ∈M ⊆ L.


Adding (4.9), (4.10), and (4.11), we obtain:∑i 6∈L

[f(SL + oi)− f(SL)] +∑i∈L

[f(SL − si + oi)− f(SL − si)]

≥ f(SL ∪O[r]\L) + f(SL ∪OL) + (1− c)f(SM )− 2f(SL)

≥ f(SL ∪O) + f(SL) + (1− c)f(SM )− 2f(SL) by submodularity of f

≥ f(O) + (1− c)f(SL\M ) + f(SL) + (1− c)f(SM )− 2f(SL) by curvature of f

≥ f(O) + (1− c)f(SL) + f(SL)− 2f(SL) by submodularity of f

= f(O)− cf(SL) .

This inequality is valid for any particular assignment of values from [r] to the indices of S and

O. Averaging over all possible such assignments, we obtain the inequality

(r − `)(F`,0,1 − F`,0,0) + `(F`−1,0,1 − F`−1,0,0) ≥ f(O)− cF`,0,0 .

Lemma 4.1 gives us r+1 inequalities, one for each value of ` ∈ 0, . . . , r. We combine these

inequalities with (4.8) to derive an inequality relating f(S) to f(O). The following subsections

give the technical details necessary for completing the proof, which we now describe at a very

high level. In Section 4.2.2 we combine the r+1 inequalities from Lemma 4.1 to obtain a single

inequality. This is accomplished by multiplying each inequality by γ(r)`+1 − γ

(r)` and adding the

resulting inequalities. This approach requires that each of the values γ(r)`+1−γ

(r)` are non-negative

and that the sequence γ(r) = γ(r)0 , . . . , γ

(r)r+1 satisfies a particular recurrence relation, which we

use to telescope the resulting summation. In Section 4.2.1, we prove these two properties of the

γ sequences and additionally derive an upper and lower bound on g in terms of f , which we

shall use in several of our proofs. Section 4.2.2 then completes the analysis of the locality ratio

of MatroidSubmodular, beginning with the proof of Lemma 4.8. Our proofs assume that g can

be computed exactly. In Section 4.2.3 we remove this assumption and turn to the problem of

estimating g efficiently. We give a random sampling procedure for estimating g that requires

only a polynomial number of samples. In Lemmas 4.10 and 4.11 we bound the error introduced

into our analysis by the use of our sampled estimates for g. Section 4.2.4 completes the analysis

of MatroidSubmodular. Our main results are presented in Theorems 4.12–4.14.

4.2.1 Properties of the Sequences γ

In this section, we prove three properties of the γ sequences. Specifically, we show that each

sequence γ(m) = γ(m)0 , γ

(m)1 , . . . , γ

(m)m+1 satisfies a particular recurrence (Lemma 4.2) and is non-

decreasing (Lemma 4.6). These facts will be used to combine the inequalities from Lemma 4.1

into a single inequality in Lemma 4.8. We also derive an expression for the value∑m

k=1 β(m)k

(Lemma 4.7) that will be used in Lemma 4.9, where we bound the locality ratio of Matroid-

Submodular, as well as in Section 4.2.3, where we give a sampling procedure used to compute


g.

Lemma 4.2. For each m ≥ 1, the sequence γ(m) satisfies the recurrence:

`γ(m)`+1 = (2`−m+ c− 1)γ

(m)` + (m− `+ 1)γ

(m)`−1, 1 ≤ ` ≤ m (γ-rec)

Proof. We proceed by induction on m. In the case that m = 1, we must only show that (γ-rec)

is valid for ` = 1. In this case, (γ-rec) becomes

γ(1)2 = cγ

(1)1 + γ

(m)0 .

By definition, we have γ(1)0 = 1, γ

(1)2 = ec, and γ

(1)1 = c−1(ec − 1), so this is equivalent to

ec = c · c−1(ec − 1) + 1 ,

and so (γ-rec) holds.

In the general case that m > 1 we consider 3 subcases based on the value of `. When ` = 1,

we have

γ(m)2 = c−1m(γ

(m−1)2 − γ(m−1)

1 ) by (γ-up)

= c−1m[(2−m+ c)γ

(m−1)1 + (m− 1)γ

(m−1)0 − γ(m−1)

1

]by induct. hyp. with ` = 1

= c−1m[(1−m+ c)γ

(m−1)1 − (1−m+ c)γ

(m−1)0 + cγ

(m−1)0

]algebra

= (1−m+ c)γ(m)1 +mγ

(m−1)0 by (γ-up)

= (1−m+ c)γ(m)1 +mγ

(m)0 by (γ-base) .

When ` = m, we have

mγ(m)m+1 = mγ(m−1)

m by (γ-base)

= c−1m[(m− 1 + c)γ(m−1)

m − (m− 1)γ(m−1)m

]algebra

= c−1m[(m− 1 + c)γ(m−1)

m − (m− 2 + c)γ(m−1)m−1 − γ(m−1)

m−2

]by induct. hyp.

= c−1m[(m− 1 + c)(γ(m−1)

m − γ(m−1)m−1 ) + γ

(m−1)m−1 − γ(m−1)

m−2

]algebra

= (m− 1 + c)γ(m)m + γ

(m)m−1 by (γ-up) .


When 2 ≤ ` ≤ m− 1, we have

`γ(m)`+1 = c−1m(`γ

(m−1)`+1 − `γ(m−1)

` ) by (γ-up)

= c−1m[(2`−m+ c)γ

(m−1)` + (m− `)γ(m−1)

`−1 − `γ(m−1)`

]by induct. hyp.

= c−1m[(2`−m+ c− 1)γ

(m−1)` + (m− `)γ(m−1)

`−1 − (`− 1)γ(m−1)`

]algebra

= c−1m[(2`−m+ c− 1)γ

(m−1)` + (m− `)γ(m−1)

`−1

−(2`−m+ c− 2)γ(m−1)`−1 − (m− `+ 1)γ

(m−1)`−2

] by induct. hyp.

= c−1m[(2`−m+ c− 1)γ

(m−1)` + (m− `+ 1)γ

(m−1)`−1

−(2`−m+ c− 1)γ(m−1)`−1 − (m− `+ 1)γ

(m−1)`−2

] algebra

= (2`−m+ c− 1)γ(m)` + (m− `+ 1)γ

(m)`−1 by (γ-up) .

In order to prove our second lemma, which shows that the sequence γ(m) = γ(m)0 , . . . , γ

(m)m+1

is increasing, we make use of the theory of Pade approximants. This theory is quite extensive

(a comprehensive overview is given by Baker and Graves-Morris [10]) but we only need a few

fundamental results from the area, all of which appear in Pade’s thesis [78]. We give several

more accessible references for each result, as well. First, we give the basic definition of a Pade

approximant.

Definition 4.3 (Pade Approximant). Suppose that a function F is given by the power series:

F (x) =∞∑i=0

cixi

Then, for µ, ν ∈ N, the [µ/ν] Pade approximant to F is the rational function denoted by F[µ/ν]

of the form

F[µ/ν](x) =a0 + a1x+ . . .+ aµx

µ

1 + b1x+ . . .+ bνxν

whose Maclaurin expansion agrees with F for the first µ+ ν terms.

We are primarily concerned with the Pade approximants of the function ex. This was the

among the first functions whose approximation were rigorously studied in the context of Pade

approximants. Indeed, a full treatment of the function ex appears in Pade’s thesis [78, Part

2, Section 4]. A more recent derivation of the approximants to ex and their properties can be

found in Baker and Graves-Morris [10, Sections 1.2 and 10.3]. A concise statement of all the

properties we require appears in Underhill and Wragg [92].

The [µ/ν] Pade approximant to the function ex is given by R[µ/ν](x) =P[µ/ν](x)

Q[µ/ν](x) , whose


numerator and denominator are given by the polynomials:

P[µ/ν](x) =

µ∑k=0

xk(µ+ ν − k)!µ!

k!(µ+ ν)!(µ− k)!, Q[µ/ν](x) =

ν∑k=0

(−x)k(µ+ ν − k)!ν!

k!(µ+ ν)!(ν − k)!.

The following known2 formula gives the error in the approximant R[µ/ν].

exQ[µ/ν](x)− P[µ/ν](x) = (−1)νxµ+ν+1

(µ+ ν)!

∫ 1

0exttν(1− t)µdt . (4.12)

In order to relate the γ coefficients to the theory of Pade approximants, we first give a closed

formula for them.

Lemma 4.4. For all m ≥ 1, the members of γ(m) are given by the following formula:

γ(m)` = (−1)`−1

m−1∑k=0

m!

k!ck−m

[(−1)k

(m− 1− k`− 1− k

)ec −

(m− 1− k`− 1

)](γ-closed)

where 1 ≤ ` ≤ m. (γ-base).

Proof. We proceed by induction on m. When m = 1, the formula gives:

γ(1)1 = (−1)0 1!

0!c−1

[(−1)0

(0

0

)ec −

(0

0

)]= c−1 (ec − 1) ,

and the defining recurrence (γ-up) and (γ-base) gives

γ(1)1 = c−1

(γ

(0)1 − γ(0)

0

)= c−1 (ec − 1) .

If m > 1, then, by construction , the sequence γ(m) is related to γ(m−1) by (γ-up) and, by

the induction hypothesis, γm−1` is given by (γ-closed) for all 1 ≤ ` ≤ m − 1. Thus, γ

(m)` is

2Again, a derivation of this formula appears in Pade’s thesis [78, (66)]. A modern derivation appears in Bakerand Graves-Morris[10, Section 10.3] or (using integral representations for P[µ/ν] and Q[µ/ν]) Braess [15, SectionV.7.A (5.2)].


given by

c−1m(γ

(m−1)` − γ(m−1)

`−1

)= c−1m(−1)`−1

m−2∑k=0

(m− 1)!

k!ck−m+1

[(−1)k

(m− 2− k`− 1− k

)ec −

(m− 2− k`− 1

)]

− c−1m(−1)`−2m−2∑k=0

(m− 1)!

k!ck−m+1

[(−1)k

(m− 2− k`− 2− k

)ec −

(m− 2− k`− 2

)]

= (−1)`−1m−2∑k=0

m!

k!ck−m

[(−1)k

[(m− 2− k`− 1− k

)+

(m− 2− k`− 2− k

)]ec

−[(m− 2− k`− 1

)+

(m− 2− k`− 2

)]]= (−1)`−1

m−2∑k=0

m!

k!ck−m

[(−1)k

(m− 1− k`− 1− k

)ec −

(m− 1− k`− 1

)].

If 2 ≤ ` ≤ m− 1, then we have both(m−1−(m−1)

(`−1−(m−1))

)= 0 and

(m−1−(m−1)`−1

)= 0, and so:

γ(m)` = (−1)`−1

m−2∑k=0

m!

k!ck−m

[(−1)k

(m− 1− k`− 1− k

)ec −

(m− 1− k`− 1

)]

= (−1)`−1m−1∑k=0

m!

k!ck−m

[(−1)k

(m− 1− k`− 1− k

)ec −

(m− 1− k`− 1

)].

It remains to show that the formula (γ-closed) holds for ` = 1 and for ` = m. In the case of

` = 1, the formula (γ-closed) gives

γ(m)1 = (−1)0

m−1∑k=0

m!

k!ck−m

[(−1)k

(m− 1− k−k

)ec −

(m− 1− k

0

)]

=m!c−mec −m−1∑k=0

m!

k!ck−m,


while using the recurrence (γ-up), (γ-base), and the induction hypothesis, we obtain

γ(m)1 = c−1m

(γ

(m−1)1 − γ(m−1)

0

)= c−1m

(γ

(m−1)1 − 1

)= c−1m(−1)0

m−2∑k=0

(m− 1)!

k!ck−m+1

[(−1)k

(m− 2− k−k

)ec −

(m− 2− k

0

)]− c−1m

= m!c−mec −m−2∑k=0

m!

k!ck−m − c−1m

= m!c−mec −m−1∑k=0

m!

k!ck−m .

For the case ` = m, the formula (γ-closed) gives

γ(m)m = (−1)m−1

m−1∑k=0

m!

k!ck−m

[(−1)k

(m− 1− km− 1− k

)ec −

(m− 1− km− 1

)]

= (−1)m−1

[(m−1∑k=0

m!

k!ck−m(−1)kec

)−m!c−m

],

while using the recurrence (γ-up), (γ-base), and the induction hypothesis, we obtain

γ(m)m = c−1m

(γ(m−1)m − γ(m−1)

m−1

)= c−1m

(ec − γ(m−1)

m−1

)= c−1m

[ec −m(−1)m−2

m−2∑k=0

(m− 1)!

k!ck−m+1

[(−1)k

(m− 2− km− 2− k

)ec −

(m− 2− km− 2

)]]

= c−1mec + (−1)m−1

[(m−2∑k=0

m!

k!ck−m(−1)kec

)−m!c−m

]

= (−1)m−1

[(m−1∑k=0

m!

k!ck−m(−1)kec

)−m!

].

The formula (γ-closed) reveals a surprising relationship between the coefficients γ to the

Pade approximants to the exponential function ex. The following theorem shows that we can

restate the explicit formula (γ-closed) in terms of Pade numerators and denominators.

Lemma 4.5. For all m ≥ 1 and 1 ≤ ` ≤ m:

γ(m)` = (−1)`−1m!c−m

(m− 1

`− 1

)[Q[m−`/`−1](c)e

c − P[m−`/`−1](c)].


Proof. From (γ-closed) we have

γ(m)` = (−1)`−1

m−1∑k=0

m!

k!ck−m

[(−1)k

(m− 1− k`− 1− k

)ec −

(m− 1− k`− 1

)]

= (−1)`−1m!c−mm−1∑k=0

ck

k!

[(−1)k

(m− 1− k`− 1− k

)ec −

(m− 1− k`− 1

)]

= (−1)`−1m!c−m

[`−1∑k=0

ck

k!(−1)k

(m− 1− k`− 1− k

)ec −

m−∑k=0

ck

k!

(m− 1− k`− 1

)]

= (−1)`−1m!c−m

[`−1∑k=0

(−c)k(m− 1− k)!

k!(`− 1− k)!(m− `)!ec −

m−∑k=0

ck(m− 1− k)!

k!(`− 1)!(m− `− k)!

]

= (−1)`−1m!c−m(m− 1)!

(`− 1)!(m− `)!

[`−1∑k=0

(−c)k(m− 1− k)!(`− 1)!

k!(m− 1)!(`− 1− k)!ec

−m−∑k=0

ck(m− 1− k)!(m− `)!k!(m− 1)!(m− `− k)!

]

= (−1)`−1m!c−m(m− 1

`− 1

)[Q[m−`/`−1](c)e

c − P[m−`/`−1](c)].

Using 4.5, we can now prove our second main result regarding the γ sequences.

Lemma 4.6. For any m ≥ 0, the sequence γ(m) = γ(m)0 , γ

(m)1 , . . . , γ

(m)m+1 is non-decreasing.

Proof. Consider γ(m)` and γ

(m)`−1 for some m ≥ 0 and 1 ≤ ` ≤ m+ 1. By definition, we then have

γ(m+1)` = c−1(m+ 1)

(γ

(m)` − γ(m)

`−1

),

and so γ(m)` ≥ γ

(m)`−1 if and only if γ

(m+1)` ≥ 0. We now show that this must be the case for all

m ≥ 1 and 1 ≤ ` ≤ m+ 1. From Lemma 4.5, we have:

γ(m+1)` = (−1)`−1(m+ 1)!c−(m+1)

(m

`− 1

)[Q[m+1−`/`−1](c)e

c − P[m+1−`/`−1](c)]

= (−1)`−1(m+ 1)!c−(m+1)

(m

`− 1

)[(−1)`−1 cm+1

(m+ 1)!

∫ 1

0ectt`−1(1− t)m+1−`dt

]=

(m

`− 1

)∫ 1

0ectt`−1(1− t)m+1−`dt ,

Where we have used the formula (4.12) for the error Q[µ/ν](x)ex − P[µ/ν](x) in the second line.

Now, we note that the integral above must be non-negative, since the integrand is non-negative

on the interval [0, 1]. Thus, γ(m+1)` ≥ 0.

Finally, we derive an upper bound for sum of each sequence β(m). This will be useful for

bounding the value of g in later theorems.


Lemma 4.7. For all m ≥ 0,m∑k=0

β(m)k ≤ ecHm ,

Furthermore, for any set S of size m,

f(S) ≤ g(S) ≤ ecHmf(S)

Proof. For the first claim, we note that

m∑k=0

β(m)k =

m∑k=1

γ(m)k

k≤ ec

m∑k=1

1

k= ecHm. (4.13)

Here, the first equality follows from the β(m)0 = 0, and the inequality from both Lemma 4.6,

which shows that the sequence γ(m) is non-decreasing, and (γ-base), which shows that γ(m)m+1 =

ec.

For the upper bound on g, we note that

g(S) =m∑k=0

β(m)k(mk

) ∑T∈(Sk)

f(T ) ≤m∑k=0

β(m)k f(S) ≤ ecHmf(S)

where the first inequality follows from the monotonicity of f and the final inequality from (4.13).

For the lower bound on g, we note that

g(S) =

m∑k=0

β(m)k(mk

) ∑T∈(Sk)

f(T ) ≥m∑k=0

β(m)k

k

mf(S) =

m∑k=1

γ(m)k

mf(S) ≥

m∑k=1

1

mf(S) = f(S) ,

where the first inequality follows from Theorem 2.8, and the second inequality follows from

Lemma 4.6 and (γ-base).

4.2.2 Locality Ratio

We now have the necessary machinery to continue our proof of the locality ratio of Matroid-

Submodular. Using the facts about the coefficients γ proved in Section 4.2.1, we combine the

inequalities from Lemma 4.1 into a general inequality relating f(S), f(O), and the symmetric

variables. Recall that r is the rank of the matroid (X, I) for the instance we are considering,

and |O| = |S| = r.

Lemma 4.8.

ecf(S) +r∑

k=0

γ(r)k (Fk−1,0,1 − Fk,0,0) ≥ ec − 1

cf(O)

Proof. For each ` ∈ 0, . . . , r we multiply the corresponding inequality from Lemma 4.1 by

(γ(r)`+1 − γ

(r)` ). Lemma 4.6 implies that (γ

(r)`+1 − γ

(r)` ) > 0, so this does not change the direction


of any of the inequalities. We then add the r + 1 resulting inequalities, to obtain:

r∑`=0

(γ(r)`+1 − γ

(r)` ) [(r − `)(F`,0,1 − F`,0,0) + `(F`−1,0,1 − F`−1,0,0) + cF`,0,0]

≥r∑`=0

(γ(r)` − γ

(r)`−1)f(O) . (4.14)

We now consider the coefficients of all the symmetric values Fk,0,0 and Fk−1,0,1 occurring in

(4.14), where −1 ≤ k ≤ r. Note that these are the only symmetric values occurring in (4.14).

Each term Fk,0,0, for 0 ≤ k ≤ r − 1 appears in the inequality for ` = k, with coefficient

(γ(r)k+1−γ

(r)k )(k−r+c) and in the inequality for ` = k+1, with coefficient (γ

(r)k+2−γ

(r)k+1)(−k−1).

Thus, its coefficient in (4.14) is

−(k + 1)γ(r)k+2 + (2k − r + c+ 1)γ

(r)k+1 + (r − k − c)γ(r)

k

= −(k + 1)γ(r)k+2 + (2(k + 1)− r + c− 1)γ

(r)k+1 + (r − (k + 1) + 1)γ

(r)k − cγ

(r)k

= −(k + 1)γ(r)k+2 + (k + 1)γ

(r)k+2 − cγ

(r)k

= −cγ(r)k ,

where we have used the recurrence (γ-rec) from Lemma 4.2 for the second equality. It remains

to consider the terms F−1,0,0 and Fr,0,0. By definition, F−1,0,0 = 0, so its coefficient does not

affect the sum. We assign it coefficient 0, and omit it from resulting sum. The term Fr,0,0

appears only in the inequality for ` = r, with coefficient (γ(r)r+1 − γ

(r)r )c.

Similarly, each term Fk−1,0,1 for 1 ≤ k ≤ r appears in the inequality for ` = k with coefficient

(γ(r)k+1−γ

(r)k )k and in the inequality for ` = k− 1 with coefficient (γ

(r)k −γ

(r)k−1)(r−k+ 1). Thus,

its coefficient in (4.14) is

kγ(r)k+1 − (2k − r − 1)γ

(r)k − (r − k + 1)γ

(r)k−1

= kγ(r)k+1 − (2k − r + c− 1)γ

(r)k − (r − k + 1)γ

(r)k−1 + cγ

(r)k

= kγ(r)k+1 − kγ

(r)k+1 + cγ

(r)k

= cγ(r)k

where again we have used the recurrence (γ-rec) from Lemma 4.2 for the second equality. It

remains to consider the terms F−1,0,1 and Fr,0,1. By definition, both of these terms are 0, and

so their coefficients do not affect the sum. For convenience, we assign F−1,0,1 coefficient cγ(r)k .

We assign Fr,0,1 coefficient 0 and omit it from the sum.


Considering all the coefficients, we find that the left side of inequality (4.14) is

c(γ(r)r+1 − γ

(r)r )Fr,0,0 −

r−1∑k=0

c γ(r)k Fk,0,0 +

r∑k=0

c γ(r)k Fk−1,0,0

= cγ(r)r+1Fr,0,0 + c

r∑k=0

γ(r)k (Fk−1,0,1 − Fk,0,0)

= c ecf(S) + cr∑

k=0

γ(r)k (Fk−1,0,1 − Fk,0,0) ,

where we have used (4.3) and (γ-base) in the final line. The right side of (4.14) is

r∑k=0

(γ(r)k+1 − γ

(r)k )f(O) = γ

(r)r+1 − γ

(r)0 f(O) = (ec − 1)f(O) ,

where we have used (γ-base) in the final equality. Thus, (4.14), is equivalent to

ecf(S) +r∑

k=0

γ(r)k (Fk−1,0,1 − Fk,0,0) ≥ ec − 1

cf(O) .

Using the inequality from Lemma 4.8, we now prove this section’s main result, bounding

the locality ratio of MatroidSubmodular.

Theorem 4.9. Suppose O is a global optimum of f and that S is an ε-approximate local

optimum for g. Then,

(1 + ε rHr)f(S) ≥ 1− e−c

cf(O) ,

Proof. We recall the inequality (4.8), which follows from the ε-approximate local optimality of

S:

(rε)g(S) +

r∑k=0

γ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0 .

Applying the upper bound on g(S) from Lemma 4.7 to the first term, we obtain

rε ecHrf(S) +r∑

k=1

γ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0 . (4.15)

Adding (4.15) to the inequality from Lemma 4.8, we obtain

rε ecHrf(S) + ecf(S) ≥ ec − 1

cf(O) .

Multiplying by e−c then completes the proof.


4.2.3 Computing g

In all of our analysis thus far, we have supposed that we could compute g exactly, even though

this required an exponential number of calls to the value oracle for f . Now we address the

issue of computing g(S) efficiently. When we first introduced g in Section 4.1, we gave an

interpretation of the value g(S) in terms of a random process. We now elaborate further on

this interpretation to give a randomized sampling scheme for computing g(S).

Let m = |S| and define the normalization constant

Bm =

m∑i=0

β(m)i .

From Lemma 4.7 in Section 4.2.1, we have

Bm ≤ ecHm . (4.16)

We define a random set X using the following two step experiment. First, let L be a random

variable taking value k with probability β(m)k /Bm, for 1 ≤ k ≤ m. Then let X be a uniformly

random subset of S of size L. From the linearity of expectation and the definition of g (4.1)

we have Bm E[f(X)] = g(S). We now show, via standard concentration bounds, that this

expectation can be estimated well by sampling from the distribution defining X.

Lemma 4.10. Let S be a set of size m. Let N be a positive integer, and X1, . . . , XN be N

i.i.d. random samples drawn from the distribution for X. Define g = Bm1N

∑Ni=1 f(Xi). Then,

for any ε > 0,

Pr[|g(S)− g(S)| > εg(S)] ≤ 2 exp

(− ε2N

e2cH2m

).

Proof. Moreover, because f is monotone and normalized, we have 0 ≤ Bmf(Xi) ≤ Bmf(S) for

every sample Xi. Hoeffding’s bound then gives that Pr[|g(S)− g(S)| > εg(S)] is at most:

2 exp

(−2ε2g(S)2N

B2mf(S)2

)≤ 2 exp

(−2ε2N

B2m

)≤ 2 exp

(− 2ε2N

(ecHm)2

),

where the first inequality follows from bound f(S) ≤ g(S) from Lemma 4.7 and the final

inequality from (4.16).

The next lemma allows us to relate the error introduced by estimating g to our analysis of

the locality ratio of MatroidSubmodular.

Lemma 4.11. Let 0 < ε ≤ 1/2 and suppose that |g(A)− g(A)| ≤ εg(A), |g(B)− g(B)| ≤ εg(B)

and (1 + ε)g(A) ≥ g(B). Then, (1 + 7ε)g(A) ≥ g(B).

Proof. The premises imply that

(1 + ε)2g(A) ≥ (1 + ε)g(A) ≥ g(B) ≥ (1− ε)g(B) .


Therefore,(1 + ε)2

1− εg(A) ≥ g(B) .

The lemma follows from the fact that

(1 + ε)2

1− ε= 1 +

ε(3 + ε)

1− ε≤ 1 + 7ε ,

since ε(3 + ε)/(1− ε) is increasing and ε ≤ 1/2.

4.2.4 Main Results

We now assemble the technical details from the previous sections to prove our main theorems,

which regard the runtime and approximation performance of MatroidSubmodular.

Theorem 4.12. Let 0 < ε ≤ 1 and c ∈ (0, 1], and set

G = Cε−11 rn log

(3

2ecHr

), N = ε−2

1 e2cH2r lnG ,

for an appropriate universal constant C. Suppose that we compute g in MatroidSubmodular

by using the procedure described in Section 4.2.3 with N samples. Then, with probability 1 −o(1), Algorithm MatroidSubmodular is a

(1−e−cc − ε

)-approximation algorithm, running in time

O(ε−3r4n).

Proof. Then we note that3 ε1 < 1/2.

Let g(A) denote the value obtained when estimating g(A) by using N samples, as in Section

4.2.3. We analyze the algorithm under the assumption that every value g(A) estimated by the

algorithm satisfies

(1− ε1)g(A) ≤ g(A) ≤ (1 + ε1)g(A) .

We will then show that this happens with probability 1− o(1).

As usual, we fix an instance (f,M = (X, I)) and let O be the optimal solution for this

instance and S be the solution produced by MatroidSubmodular. Then, S must be an ε1-

approximate local optimum of g. Lemma 4.11 shows that S is a 7ε1-approximate local optimum

of g. Theorem 4.9 then implies that

(1 + 7ε1rHr)f(S) ≥ 1− e−c

cf(O) .

Expanding the definition

ε1 =ε

7rHr

(1− e−c

c− ε)−1

,

3This bound is intentionally loose, to simplify the proof.


we obtain

f(S) ≥(

1− e−c

c− ε)f(O) ,

as required.

We now bound the number of improvements that the algorithm makes. Consider the

solution Sinit produced by SubmodularGreedy when applied to the instance (f,M), and let

Of = arg maxA∈I f(A) and Og = arg maxA∈I g(A). Then,

f(Sinit) ≥ 1

2f(Of ) ≥ 1

2f(Og) .

The first inequality follows from the fact that SubmodularGreedy is a 1/2-approximation for

maximizing a monotone submodular function subject to a matroid constraint. Applying the

upper and lower bounds on g from Lemma 4.7 to this inequality, we find

Fg(Sinit) ≥ f(Sinit) ≥ 1

2f(Og) ≥

1

2ecHrg(Og) (4.17)

Every time the algorithm applies an improvement, it must improve g(S) by at least a factor of

(1 + ε1). Thus, the number of improvements MatroidSubmodular can make is at most

log1+ε1

g(Og)

g(S)≤ log1+ε1

1 + ε11− ε1

ecHr

2≤ log1+ε1

3ecHr

2≤ Cε−1

1 log

(3

2ecHr

),

for some universal constant C. Here, the first inequality follows from our assumption about

evaluations of g and (4.17), and the second from the fact that 1+ε11−ε1 is increasing and ε1 < 1/2.

Now, we show that with probability 1− o(1) we do indeed have |g(A)− g(A)| ≤ ε1g(A) for

all sets considered by the algorithm. Each improvement step requires at most rn evaluations

of g, and the algorithm makes at most Cε−11 log

(32ecHr

)improvements. Thus, the algorithm

requires at most

G = Cε−11 rn log

(3

2ecHr

)total evaluations of g. We set:

N = ε−21 e2cH2

r lnG .

Lemma 4.10 then shows that the probability that for a given set A, |g(A)− g(A)| > ε1g(A), is

at most 2/G2. Hence the probability that no set considered by the algorithm has

|g(A)− g(A)| > ε1g(A)

is at least 1− 2/G = 1− o(1).

The final algorithm requires a total of

GN = O(ε−31 rn) = O(ε−3r4n)


calls to the value oracle for f and G = O(ε−1r2n log r) calls to the independence oracle for M.

Its runtime is dominated by the total number of calls to the value oracle for f .

As in the case of MatroidCoverage, we can eliminate the ε from the approximation ratio by

employing the partial enumeration technique described in Section 2.6. Our next two theorems

will use the definition

ρ(c) =1− e−c

c.

Additionally, in the following theorems, We suppose that we terminate all calls to Matroid-

Submodular after O(ε−3r4n) steps an return the current solution. Even in this case, Theorem

4.12 still guarantees that with probability 1 − o(1), any given call to MatroidSubmodular is a

(ρ(c)− ε)-approximation.

Theorem 4.13. Suppose that we set ε = 1−ρ(c)r in MatroidSubmodular. Then, with probability

1 − o(1), PartEnum(MatroidSubmodular) is a ρ(c)-approximation algorithm running in time

O(r7n2).

Proof. We recall from Section 2.6 that PartEnum(MatroidSubmodular) makes n calls to Matroid-

Submodular, one for each element in X. Moreover, the proof of Theorem 2.22 shows that as

long as the call to MatroidSubmodular for the element y = arg maxx∈O f(x) is a ρ(c) − ε-

approximation, then the approximation ratio of PartEnum(MatroidCoverage) is at least

1

r+r − 1

r(ρ(c)− ε) 1

r+r − 1

r

(ρ(c)− 1− ρ(c)

r

)=

1

r+r − 1

r

((r + 1)ρ(c)

r− 1

r

)≥ 1

r+

(r − 1)(r + 1)ρ(c)

r2− 1

r= ρ(c)

(1 +

1

r2

)≥ ρ(c) .

Theorem 4.12 shows that with probability 1−o(1) the call to MatroidSubmodular corresponding

to element y is indeed a (ρ(c)− ε)-approximation.

Each of the n calls to MatroidSubmodular require time

O(ε−3r4n) = O((1− ρ(c))−3r7n) .

Thus, the total runtime of PartEnum(MatroidSubmodular) is O((1− ρ(c))−3r7n2).

One weakness of our algorithm, when compared to the continuous greedy algorithm, is that

it requires knowledge of the curvature of the function f . In our next theorem, we show how this

requirement can be eliminated by enumerating over values of c with enough granularity. Since

the function ρ(c) is continuous, a small error in an estimate of c will translate into a small error

in the approximation ratio. The resulting algorithm, CurvatureEnum, is shown in Algorithm 8.

Theorem 4.14. Consider an instance (f,M) and suppose that f has total curvature c. With

probability 1− o(1), CurvatureEnum is a (ρ(c)− ε)-approximation algorithm for (f,M) running

in time O(ε−4r4n).


Algorithm 8: CurvatureEnum

Input: Approximation parameter ε > 0Monotone submodular function f , as a value oracleMatroid M = (X, I), as an independence oracle for I

Let C = kε : 1 ≤ k ≤ b1/εc ∪ 1;foreach c ∈ C do

Let Sc be the result of running MatroidSubmodular on (ε/2, c,X, I);

return arg maxc∈C f(Sc);

Proof. Let S be the solution produced by CurvatureEnum on the instance (f,M) and let O be

the optimal solution for this instance. Define c0 = minx ∈ C : c ≤ x. Then, c ≤ c0 < c + ε,

and f(S) ≥ f(Sc0). Since f has curvature at most c0, Theorem 4.12 shows that f(Sc0) ≥(ρ(c0)− ε/2)f(O) with probability 1− o(1).

We now bound the rate at which ρ(c) can decrease. We have

ρ′′(c) =e−c

2c3

[ec −

(1 + c+

c

2

)].

Using the series expansion for ec, we find that the bracketed expression must be positive, and

so ρ′′(c) > 0 on (0, 1]. Hence, ρ′(c) is an increasing function on this interval and so its minimum

on (0, 1] is bounded from below by limx→∞ ρ′(x) = −1/2 . Therefore, the ratio f(S)/f(O) is

at least

ρ(c0)− ε/2 ≥ ρ(c+ ε)− ε/2 ≥ ρ(c)− ε ,

where the first inequality follows from the fact that ρ(c) is decreasing on (0, 1], and the sec-

ond from the bound ρ′(c) ≥ −1/2. It follows that CurvatureEnum is at least a (ρ(c) − ε)-

approximation algorithm.

The algorithm CurvatureEnum makes ε−1 calls to MatroidSubmodular, each taking time

O(ε−3r4n). The runtime of CurvatureEnum is thus O(ε−4r4n).

In order to completely match the generality of the continuous greedy algorithm, we would

like to combine the results of Theorems 4.13 and 4.14 to obtain a clean (1−e−c)c -approximation

without knowledge of c. Unfortunately, the runtime of our partial enumeration algorithm

in Theorem 4.13, and hence the runtime of the combined approach, depends on the value

of c. Thus, the resulting algorithm runs in pseudo-polynomial time, rather than polynomial

time. Indeed, as c becomes arbitrarily close to 0, the time required by the partial enumeration

algorithm diverges. Practically, this means that the combined approach would require that

some nontrivial, constant lower bound on the value of c be specified.


4.3 Obtaining the Coefficient Sequences

Here, we provide an account of how the recurrences and formulas for γ sequences were derived.

The overall approach is similar to that described in Section 3.4. That is, it makes use of a

series of linear programs, one for each value of r. For instances of rank r, a solution to the

corresponding program gives the optimal locality ratio attainable on instances of rank r when

using a function of the general form given by (4.1), as well as the values of the coefficients

β(r)0 , . . . , β

(r)r defining this function. We now show how the program for each value of r is

formulated. For simplicity, we assume that c = 1 and ε = 0. That is, we do not consider the

curvature of f and consider only exact local optima.

We following the same general line approach as in Section 3.4. For a sequence β =

β(r)0 , . . . , β

(r)r , we let gβ,f denote the function g obtained from f and β by using (4.1). Then, the

function gβ,f is defined only on sets of size r, but this will suffice for our purposes. We suppose

that we are given a sequence β = β(r)0 , . . . , β

(r)r and want to find the worst-case locality ratio of

MatroidSubmodular over all instances M, f with rank r, when gβ,f is used as the non-oblivious

potential function in the algorithm.

Let S = s1, . . . , sr and O = o1, . . . , or be two sets of size r. Following the same line of

reasoning as in Section 3.4, we can restrict ourselves to the case in which S and O are disjoint

and consider the following program.

minsubmodular f

f(S)

s.t. gβ,f (S)− gβ,f (S − si + oi) ≥ 0 1 ≤ i ≤ r

f(O) = 1

(4.18)

Now, we must somehow implement the minimization over all submodular functions f . A

naıve, direct formulation involves introducing a variable representing the value of f(A) for each

subset A of S ∪ O, then adding constraints to the resulting set of variables corresponding to

the conditions that f is monotone and submodular. The number of variables and constraints

required by this approach is exponential in r. We can do better by using the symmetric variables

Fa,x,b introduced in Section 4.2. We have shown how to express the values f(S) and f(O) in

equations (4.3) and (4.4). While we cannot express the r local optimality constraints of (4.18)

directly, we can represent their sum

r∑i=1

[gβ,f (S)− gβ,f (S − si + oi)] ≥ 0 ,

by using equations (4.5) and (4.6).

Now, we consider how to represent the condition that f is monotone submodular. We

do this by adding a set of constraints to our linear program. These constraints must also be

expressed in terms of the symmetric variables Fa,x,b. For monotonicity, we have the constraint


f(A + y) ≥ f(A) for every A ⊆ X and y 6∈ A. Consider some sets of indices I, J ⊆ [r], with

|I \ J | = a, |J \ I| = b, and |I ∩ J | = x. Then, we have 4 separate cases, depending on whether

y is of the form si or oi and whether i ∈ I ∪ J or not. In each case, we average over all possible

assignments of values from [r] to indices and obtain the following 4 inequalities:

Fa+1,x,b ≥ Fa,x,b y = si and i 6∈ J

Fa−1,x+1,b ≥ Fa,x,b y = si and i ∈ J

Fa,x,b+1 ≥ Fa,x,b y = oi and i 6∈ I

Fa,x+1,b−1 ≥ Fa,x,b y = oi and i ∈ I

As an example, let us give a derivation of the second inequality above, for a = 1, x = 0, b = 1.

Every set in X1,0,1 has the form Si, Oj for some pair of indices i 6= j ∈ [r]. Similarly, each

set in X0,1,1 has the form Si, Oi, Oj for some pair of indices i 6= j ∈ [r]. From monotonicity

of f , for any i 6= j ∈ [r], we have:

f(Si, Oi, Oj) ≥ f(Si, Oj) .

Averaging this inequality over all possible assignments of values from [r] to i and j satisfying

i 6= j, we obtain

F0,1,1 ≥ F1,0,1 .

This approach extends in a straightforward way to the remaining 3 inequalities, and all values

of a, x, b.

For submodularity, we have the constraint f(A+ y) + f(A+ z) ≥ f(A+ y + z) + f(A) for

every A ⊆ X and y, z 6∈ A. Again, by considering each separate case in terms of our symmetric


notation we obtain the following separate inequalities:

Fa,x+1,b−1 + Fa,x+1,b−1 ≥ Fa,x+2,b−2 + Fa,x,b y = si, z = sj and i ∈ j, j ∈ J

Fa+1,x,b + Fa,x+1,b−1 ≥ Fa+1,x+1,b−1 + Fa,x,b y = si, z = sj and i 6∈ J, j ∈ J

Fa+1,x,b + Fa+1,x,b ≥ Fa+2,x,b + Fa,x,b y = si, z = sj and i 6∈ J, j 6∈ J

Fa−1,x+1,b + Fa−1,x+1,b ≥ Fa−2,x+2,b + Fa,x,b y = oi, s = oj and i ∈ I, j ∈ I

Fa,x,b+1 + Fa−1,x+1,b ≥ Fa−1,x+1,b+1 + Fa,x,b y = oi, z = oj and i 6∈ I, j ∈ I

Fa,x,b+1 + Fa,x,b+1 ≥ Fa,x,b+2 + Fa,x,b y = oi, z = oj and i 6∈ I, j 6∈ I

Fa−1,x+1,b + Fa,x+1,b−1 ≥ Fa−1,x+2,b−1 + Fa,x,b y = oi, z = sj and i ∈ I, j ∈ J

Fa−1,x+1,b + Fa+1,x,b ≥ Fa,x+1,b + Fa,x,b y = oi, z = sj and i ∈ I, j 6∈ J

Fa,x,b+1 + Fa,x+1,b−1 ≥ Fa,x+1,b + Fa,x,b y = oi, z = sj and i 6∈ I, j ∈ J

Fa,x,b+1 + Fa+1,x,b ≥ Fa,x+1,b + Fa,x,b y = oi, z = sj , i 6= j and i 6∈ I, j 6∈ J

Fa,x,b+1 + Fa+1,x,b ≥ Fa+1,x,b+1 + Fa,x,b y = oi, z = sj , i = j and i 6∈ I, j 6∈ J


Putting everything together, we obtain the linear program:

minFa,x,b

Fr,0,0

s.t.r∑

k=0

kβ(r)k (Fk,0,0 − Fk−1,0,1) ≥ 0

Local Optimality

Constraint

F0,0,r = 1

Fa+1,x,b ≥ Fa,x,bFa−1,x+1,b ≥ Fa,x,bFa,x,b+1 ≥ Fa,x,bFa,x+1,b−1 ≥ Fa,x,b

Monotonicity

Constraints

a, x, b ≥ 0,

a+ x+ b ≤ r

Fa+1,x,b + Fa+1,x,b ≥ Fa+2,x,b + Fa,x,b

Fa+1,x,b + Fa,x+1,b−1 ≥ Fa+1,x+1,b−1 + Fa,x,b

Fa,x+1,b−1 + Fa,x+1,b−1 ≥ Fa,x+2,b−2 + Fa,x,b

Fa,x,b+1 + Fa,x,b+1 ≥ Fa,x,b+2 + Fa,x,b

Fa,x,b+1 + Fa−1,x+1,b ≥ Fa−1,x+1,b+1 + Fa,x,b

Fa−1,x+1,b + Fa−1,x+1,b ≥ Fa−2,x+2,b + Fa,x,b

Fa−1,x+1,b + Fa,x+1,b−1 ≥ Fa−1,x+2,b−1 + Fa,x,b

Fa−1,x+1,b + Fa+1,x,b ≥ Fa,x+1,b + Fa,x,b

Fa,x,b+1 + Fa,x+1,b−1 ≥ Fa,x+1,b + Fa,x,b

Fa,x,b+1 + Fa+1,x,b ≥ Fa,x+1,b + Fa,x,b

Fa,x,b+1 + Fa+1,x,b ≥ Fa+1,x,b+1 + Fa,x,b

Submodularity

Constraints

a, x, b ≥ 0,

a+ x+ b ≤ r

Fa,x,b ≥ 0

(4.19)

If the variables Fa,x,b are the symmetric variables for some underlying monotone submodular

function f , then they must satisfy all of the constraints of (4.19), and moreover Fr,0,0 = f(S).

Thus, for every feasible solution of (4.18) there is a corresponding feasible solution to (4.19) of

equal value. However, it is not clear that the converse is true, since the symmetric constraints

are, in general, weaker than the individual constraints we summed to obtain them. We now

show that the converse does, in fact, hold. The following theorem is the direct analogue of

Theorem 3.11 from Section 3.4.

Theorem 4.15. Let F be the set of values Fa,x,b, where a, x, b ≥ 0, and a+x+b ≤ r comprising

a feasible solution of (4.19), and suppose that Fr,0,0 = v, so that F has value v. Then, there

exists a monotone submodular function f on X that is a feasible solution of (4.18) with value

f(S) = v.

Proof. The proof is similar to the proof Theorem 3.11 but is easier, as here we need not give a


particular representation for the function f . Let S = s1, . . . , sr and O = o1, . . . , or be two

subsets of X. For some A ⊆ O ∪ S, let I(A) = i ∈ [r] : si ∈ A and J(A) = i ∈ [r] : oi ∈ Abe the indices of those elements from S and O, respectively, contained in A. Then, we define

f(A) = Fa,x,b , where a = |I(A) \ J(A)|, b = |J(A) \ I(A)|, x = |I(A) ∩ J(A)| .

It is immediate from the definition that f(S) = Fr,0,0 = v and that f(O) = F0,0,r = 1, since F

is a feasible solution of (4.19).

Now, we show that f must be monotone submodular. Let A be some subset of S ∪O, and

y some element of (S ∪ O) \ A, and suppose that |I(A) \ J(A)| = a, |J(A) \ I(A)| = b and

|I(A) ∩ J(A)| = x, so that we have f(A) = Fa,x,b. Then, there are four possible values for

f(A+ y). Specifically,

f(A+ y) = Fa+1,x,b if y = si and i 6∈ J(A)

f(A+ y) = Fa,x+1,b−1 if y = si and i ∈ J(A)

f(A+ y) = Fa,x,b+1 if y = oi and i 6∈ I(A)

f(A+ y) = Fa,x+1,b−1 if y = oi and i ∈ I(A)

These cases correspond exactly to the cases considered in the monotonicity constraints in (4.19).

Thus, we have f(A + y) ≥ f(A) = Fa,x,b in each case, since F is a feasible solution of (4.19).

Similarly, for submodularity we consider a subset A of S ∪ O and y, z ∈ (S ∪ O) \ A. We

obtain 11 possible values for f(A+ x) + f(A+ y) corresponding to the 11 cases considered in

submodularity constraints in (4.19), and so the submodularity of f also follows directly from

the feasibility of F .

Next, we consider the value gβ,f (S)− gβ,f (S − si + oi). The value gβ,f (S) is given by

gβ,f (S) =

r∑k=0

β(r)k(rk

) ∑T∈(Sk)

f(T ) .

For each set T ∈(Sk

)we have T ⊆ S and so I(T ) = k and J(T ) = 0. Thus, f(T ) = Fk,0,0 for

each of the(rk

)sets T ∈

(Sk

). Thus,

gβ,f (S) =r∑

k=0

β(r)k(rk

) ∑T∈(Sk)

Fk,0,0 =r∑

k=0

β(r)k Fk,0,0 .

Similarly, the value gβ,f (S − si + oi) is given by

gβ,f (S) =

r∑k=0

β(r)k(rk

) ∑T∈(S−si+oik )

f(T ) =

r∑k=0

β(r)k(rk

) ∑T∈(S−sik−1 )

f(T + oi) +∑

T∈(S−sik )

f(T )

. (4.20)


For each set T ∈(S−sik−1

)we have I(T ) = k − 1 and J(T ) = 0. Thus, f(T + oi) = Fk−1,0,1 for

each of the(r−1k−1

)= k

r

(rk

)sets T ∈

(S−sik−1

). Similarly, for each set T ∈

(S−sik

), we have I(T ) = k

and J(T ) = 0. Thus, f(T ) = Fk,0,0 for each of the(r−1k

)= r−k

r

(rk

)sets T ∈

(S−sik−1

). Thus,

gβ,f (S − si + oi) =

r∑k=0

[β

(r)k

(r − kr

Fk,0,0 +k

rFk−1,0,1

)](4.21)

Combining the expressions (4.20) and (4.21) for g(S) and g(S − si + oi) we obtain

gβ,f (S)− gβ,f (S − si + oi) =r∑

k=0

β(r)k

[Fk,0,0 −

r − kr

Fk,0,0 +k

rFk−1,0,1

]

=1

r

r∑k=0

kβ(r)k [Fk,0,0 + Fk−1,0,1] ≥ 0 ,

where the final inequality comes from the fact that F is a feasible solution of (4.19) and so

must satisfy the local optimality constraint.

We have shown that f must be monotone submodular with f(O) = 1 and gβ,f (S)−gβ,f (S−si + oi) ≥ 0 for each i ∈ [r]. Hence, f is a feasible solution for (4.18). It has value f(S) =

Fr,0,0 = v.

As in the coverage case, the value of the optimal solution to (4.19) is therefore the same

as the value of the optimal solution to (4.18). This value corresponds to the locality ratio

of MatroidSubmodular when g is given by gβ,f . Furthermore, the solution attaining this value

gives a tight instance for the objective from gβ,f . In order to obtain the optimal values of

β(r)0 , . . . , β

(r)r we follow the same approach as in Section 3.4. That is, we take the dual of the

linear program (4.19), which is a maximization program, and then maximize over the resulting

dual variables and the values β(r)0 , . . . , β

(r)r .

We do not show the dual program here, but we note that, as in the coverage case, it is linear

except for non-linear terms of the form ρFa,x,b corresponding to the local optimality constraint.

As in the coverage case, the local optimality condition in the primal program is invariant under

scaling by a constant, so we can set the dual variable ρ = 1 arbitrarily to obtain a linear

program.

The final dual program has a variable for each of the coefficients β(r)0 , . . . , β

(r)r , a variable θ

corresponding to the constraint F0,0,r = 1, and a variable corresponding to each monotonicity

and submodularity constraint. The variable θ is also the objective maximized by the dual

program, and hence the optimal locality ratio of MatroidSubmodular. We have seen that the

primal variables define a tight instance. Here, the dual variables are useful, as well: the dual

variables give a proof of the locality ratio of MatroidSubmodular is at most θ for the given

values of β(r)0 , . . . , β

(r)r . Specifically, we obtain a proof of the inequality f(S) ≥ θ = θf(O)

(recall that we have set f(O) = 1) by combining (i.e. summing) the symmetric monotonicity

and submodularity inequalities for instances of size r. The weight with which each inequality


is included in this combination is given by the value its corresponding dual variable.

We now give a brief account of how the programs led us to the coefficient sequences β and

γ. We solved the (dual) linear programs for several values of r and then, by examining the dual

variables for values of r = 2 through r = 10, we deduced a general proof technique for deriving

the desired inequality. The technique required only the general inequality

(r − `)(F`,0,1 − F`,0,0) + `(F`−1,0,1 − F`−1,0,0)+ ≥ f(O) ,

given in Lemma 4.1. We then observed that the sum of these r inequalities telescoped into an

inequality relating f(S) = Fr,0,0 and f(O) = F0,0,r precisely when the sequence γ(r) satisfied the

recurrence (γ-rec) (as we proved in (4.14)). Using this recurrence, we extended the sequence

γ(r) to obtain one extra value γ(r)r+1, and noted that the locality ratio of MatroidSubmodular was

then given by:

γ(r)r+1 − γ

(r)0

γ(r)r+1

.

The general technique required multiplying the inequality from Lemma 4.1 by γ(r)`+1 − γ

(r)` for

each value of `. This required that the sequence γ(r) be non-decreasing, to prevent the direction

of the inequalities from being reversed. The main difficulty was then determining a set of initial

conditions for (γ-rec) that ensured this, while giving the best possible locality ratio.

After determining the proper value for the sequence γ(r), we then derived the sequences

γ(m) for m 6= r in a fashion consistent with the requirement that g be monotone submodular,

as we prove in the next section. Finally, the effect of the curvature c of f were deduced by

using an augmented version of the linear program.

4.4 Further Properties of g

We now turn to proving various properties of the potential function g defined by (4.1), (γ-base),

and (γ-up). In this section we show that g is monotone submodular, and also show that if f is

a coverage function, then the g given by (4.1) in fact corresponds to the non-oblivious potential

function g used in Chapter 3.

Our proofs require two small identities and an ancillary lemma about g. The identities

follow from the recurrence (γ-rec), stated in Lemma 4.2.

Lemma 4.16. For all 1 ≤ ` ≤ m we have the identities:

β(m+1)`(m+1`

) +β

(m+1)`+1(m+1`+1

) =β

(m)`(m`

) (4.22)

β(m)k(mk

) +β

(m+2)k(m+2k

) = 2β

(m+1)k(m+1k

) +β

(m+2)k+2(m+2k+2

) . (4.23)


Proof. The sequences γ(m) and γ(m+1) are related by (γ-up), and satisfy (γ-rec) as shown in

Lemma 4.2. Multiplying (4.22) by(m`

)(m+ 1)` we obtain the equivalent identity:

(m+ 1− `)`β(m+1)` + `(`+ 1)β

(m+1)`+1 = (m+ 1)`β

(m)` .

Using the definition γ(i)k = kβ

(i)k for k > 0, this is equivalent to

(m+ 1− `)γ(m+1)` + `γ

(m+1)`+1 + (m+ 1)γ

(m)`

Now, we have

(m+ 1− `)γ(m+1)` + `γ

(m+1)`+1

= c−1(m+ 1)[(m+ 1− `)(γ(m)

` − γ(m)`−1) + `(γ

(m)`+1 − γ

(m)` )

]by (γ-up)

= c−1(m+ 1)[`γ

(m)`+1 + (m− 2`+ 1)γ

(m)` − (m− `+ 1)γ

(m)`−1

]algebra

= (m+ 1)γ(m)` , by (γ-rec)

completing the proof of identity (4.22).

We use identity (4.22) to prove (4.23). We have

2β

(m+1)k(m+1k

) +β

(m+2)k+2(m+2k+2

)=β

(m+1)k(m+1k

) +β

(m+1)k(m+1k

) +β

(m+2)k+2(m+2k+2

)=β

(m+2)k(m+2k

) +β

(m+2)k+1(m+2k+1

) +β

(m+1)k(m+1k

) +β

(m+2)k+2(m+2k+2

) by (4.22) applied to 1st term

=β

(m+2)k(m+2k

) +β

(m+1)k+1(m+1k+1

) +β

(m+1)k(m+1k

) by (4.22) applied to 2nd and 4th terms

=β

(m+2)k(m+2k

) +β

(m)k(mk

) by (4.22) appied to 2nd and 3rd terms

completing the proof of identity (4.23).

Lemma 4.17. For any S ⊆ X with |S| = m and any x ∈ X \ S,

g(S + x) =m∑k=0

∑T∈(Sk)

[β

(m+1)k(m+1k

) f(T ) +β

(m+1)k+1(m+1k+1

) f(T + x)

]


Proof.

g(S + x) =m+1∑k=0

β(m+1)k(m+1k

) ∑T∈(S+xk )

f(T )

=

m∑k=0

β(m+1)k(m+1k

) ∑T∈(Sk)

f(T ) +

m+1∑k=1

β(m+1)k(m+1k

) ∑T∈( S

k−1)

f(T + x) splitting the sum

=m∑k=0

β(m+1)k(m+1k

) ∑T∈(Sk)

f(T ) +m∑k=0

β(m+1)k+1(m+1k+1

) ∑T∈(Sk)

f(T + x) shifting indices

=m∑k=0

∑T∈(Sk)

[β

(m+1)k(m+1k

) f(T ) +β

(m+1)k+1(m+1k+1

) f(T + x)

].

Now, we are ready to prove that g is monotone and submodular.

Theorem 4.18. For any set S of size m ≥ 0 and x /∈ S, g(S) ≤ g(S + x). Moreover, if

f(T ) = f(T + x) for each T ⊆ S then g(S) = g(S + x).

Proof.

g(S + x) =

m∑k=0

∑T∈(Sk)

[β

(m+1)k(m+1k

) f(T ) +β

(m+1)k+1(m+1k+1

) f(T + x)

]by Lemma 4.17

≥m∑k=0

∑T∈(Sk)

[β

(m+1)k(m+1k

) f(T ) +β

(m+1)k+1(m+1k+1

) f(T )

]by monotonicity of f

=m∑k=0

∑T∈(Sk)

β(m)k(mk

) f(T ) by identity (4.22) (for k > 0)

and f(∅) = 0 (for k = 0)

= g(S) .

Note that if f(T ) = f(T + x) for all T ⊆ S then the inequality is tight.

Theorem 4.19. For any set S of size m− 1 ≥ 0 and x 6= y /∈ S, we have

g(S + x) + g(S + y) ≥ g(S + x+ y) + g(S) .

Proof.

g(S + x+ y) + g(S)

=m+2∑k=0

β(m+2)k(m+2k

) ∑T∈(S+x+yk )

f(T ) +m∑k=0

β(m)k(mk

) ∑T∈(Sk)

f(T )


=

m∑k=0

β(m+2)k(m+2k

) ∑T∈(Sk)

f(T ) +

m+1∑k=1

β(m+2)k(m+2k

) ∑T∈( S

k−1)

(f(T + x) + f(T + y))

+

m+2∑k=2

β(m+2)k(m+2k

) ∑T∈( S

k−2)

f(T + x+ y) +

m∑k=0

β(m)k(mk

) ∑T∈(Sk)

f(T )

splitting the sum

=m∑k=0

2β

(m+1)k(m+1k

) ∑T∈(Sk)

f(T ) +m+1∑k=1

β(m+2)k(m+2k

) ∑T∈( S

k−1)

(f(T + x) + f(T + y))

+

m+2∑k=2

β(m+2)k(m+2k

) ∑T∈( S

k−2)

f(T + x+ y) +β

(m+2)k+2(m+2k+2

) ∑T∈(Sk)

f(T )

by identity (4.23)

(for k > 0)

and f(∅) = 0

(for k = 0)

=m∑k=0

2β

(m+1)k(m+1k

) ∑T∈(Sk)

f(T ) +m∑k=0

β(m+2)k+1(m+2k+1

) ∑T∈(Sk)

(f(T + x) + f(T + y))

+m∑k=0

β(m+2)k+2(m+2k+2

) ∑T∈(Sk)

f(T + x+ y) +β

(m+2)k+2(m+2k+2

) ∑T∈(Sk)

f(T )

shifting indices

=m∑k=0

2β

(m+1)k(m+1k

) ∑T∈(Sk)

f(T ) +m∑k=0

β(m+2)k+1(m+2k+1

) ∑T∈(Sk)

(f(T + x) + f(T + y))

+m∑k=0

β(m+2)k+2(m+2k+2

) ∑T∈(Sk)

(f(T + x+ y) + f(T ))

algebra

≤m∑k=0

2β

(m+1)k(m+1k

) ∑T∈(Sk)

f(T ) +

m∑k=0

β(m+2)k+1(m+2k+1

) ∑T∈(Sk)

(f(T + x) + f(T + y))

+

m∑k=0

β(m+2)k+2(m+2k+2

) ∑T∈(Sk)

(f(T + x) + f(T + y))

by submodularity of f

=

m∑k=0

2β

(m+1)k(m+1k

) ∑T∈(Sk)

f(T ) +

m∑k=0

β(m+1)k+1(m+1k+1

) ∑T∈(Sk)

(f(T + x) + f(T + y)) by identity (4.22)

= g(S + x) + g(S + y) by Lemma 4.17

Having shown that g is monotone submodular, we now give a brief sketch of how Matroid-

Submodular can be modified to remove the extra factor of log log r from its runtime. As in

MatroidCoverage, we run the greedy initial phase of MatroidSubmodular on g instead of f .

Consider some instance (M = (X, I), f) of monotone submodular maximization subject to a

matroid constraint. Let Sinit be the solution returned by the SubmodularGreedy on (M, g), and


let Og = arg maxA∈I g(A). Then, because g is monotone submodular, we have the bound

g(Sinit) ≥ 1

2g(Og) .

Unfortunately, we cannot compute g exactly, but can only estimate it. Calinescu et al. [21]

and Goundan and Schulz [46] both show that the greedy algorithm remains a αα+1 -approximation

for monotone submodular maximization when we have only an α-approximate incremental

oracle for g. Lemma 4.10 shows how to obtain an approximate oracle for g by sampling. We

need to turn this into an approximate oracle for the marginal value gS(x), or modify the proof of

Goundan and Schulz to use an approximate oracle for g instead of an approximate incremental

oracle for gS . Either approach reveals that it is sufficient to take n times as many as are required

to an approximate oracle for g in Lemma 4.10. Using this result, we can obtain the bound

g(Og)

g(Sinit)= O(1)

in the proof of Theorem 4.12, improving on the bound given in (4.17). Then, the remainder

of the analysis in Theorem 4.12 shows that the resulting algorithm can make at most O(ε−1)

improvements.

Finally, we show that if f is a coverage function, then the function g given by (4.1) cor-

responds (up to a scaling factor) to the non-oblivious potential function defined in (3.1) in

Chapter 3. In this sense, MatroidSubmodular is a generalization of MatroidCoverage.

Theorem 4.20. Let f = (U,w,F) be a coverage function, and let G be the function obtained

from f by using (3.1). Let g be the function obtained from f by using (4.1) with c = 1. Then,

g(S) = e ·G(S) for all S.

Proof. First, we note that since f is a coverage function,

g(S) =

|S|∑k=0

β(|S|)k(|S|k

) ∑T∈(Sk)

f(T )

=

|S|∑k=0

β(|S|)k(|S|k

) ∑T∈(Sk)

∑x∈U

s.t. x∈F (T )

w(x)

=

|S|∑k=0

β(|S|)k(|S|k

) ∑x∈U

∑T∈(Sk)

s.t. x∈F (T )

w(x)

=∑x∈U

w(x)

|S|∑k=0

β(|S|)k(|S|k

) ∣∣∣T ∈ (Sk) : x ∈ F (T )∣∣∣ .


The value|S|∑k=0

β(|S|)k(|S|k

) ∣∣∣T ∈ (Sk) : x ∈ F (T )∣∣∣

depends only on |S| and #(x, S) (the number of sets in S containing x). Hence, there must

exist coefficients ζ(|S|)#(x,S) such that

g(S) =∑x∈U

ζ(|S|)#(x,S)w(x) .

We now show that the coefficients ζ do not in fact depend on |S|. We consider the following

thought experiment. We add to F the single set Fu+1 = ∅. Then, for every A ⊆ S we have

f(A∪u+ 1) = f(A), since Fu+1 covers no elements. From Theorem 4.18, we must then have

g(S ∪ u+ 1) = g(S). Hence,∑x∈U

ζ(|S|)#(x,S)w(x) = g(S) = g(S ∪ u+ 1) =

∑x∈U

ζ(|S|+1)#(x,S) w(x) ,

for every S, and so ζ(|S|+1)#(x,S) w(x) = ζ

(|S|)#(x,S) for every S. We let ζi denote the common value of

the terms ζ(m)i for all values of m. Then,

g(S) =∑x∈U

ζ#(x,S)w(x) .

In order to derive a recurrence for the ζ terms, we consider another thought experiment.

Suppose that universe contains a single element x with w(x) = 1, and that F contains some

number p of sets Fi = x for 1 ≤ i ≤ p and one set Fp+1 = ∅. Suppose that S = [p]. Then,

g(S) = ζ#(x,S)w(x) = ζp .

Furthermore, from the definition of g, we have

g(S) =

p∑k=0

β(p)k(pk

) ∑T∈(Sk)

f(T ) =

p∑k=1

β(p)k ,

since f(T ) = 1 for all T 6= ∅ and f(∅) = 0. Thus, we have:

ζp =

p∑k=1

β(p)k . (4.24)

Now, consider the solution S′ = S ∪ p + 1, which, in addition to S, contain the index of

the set ∅. We have

g(S′) = ζ#(x,S′)w(x) = ζp ,


and, from the definition of g

g(S′) =

p+1∑k=0

β(p+1)k(p+1k

) ∑T∈(S

′k )

f(T ) =p

p+ 1β

(p+1)1 +

p+1∑k=2

β(p+1)k

since f(∅) = 0, f(T ) = 0 for the set T = p+ 1 and 1 for the other p sets T in(S′

1

), and finally

f(T ) = 1 for every set T containing at least 2 members of S′. Thus, we have:

ζp =p

p+ 1β

(p+1)1 +

p+1∑k=2

β(p+1)k . (4.25)

Equations (4.24) and (4.25) are each valid for any value of p ≥ 1. Letting p = i + 1 in

equation (4.24) and p = i in equation (4.25) we obtain the recurrence:

ζi+1 =i+1∑k=1

β(i+1)k =

1

i+ 1β

(i+1)1 +

i

i+ 1β

(i+1)1 +

i+1∑k=2

β(i+1)k =

1

i+ 1β

(i+1)1 + ζi . (4.26)

Now, we have:

ζi+1 =1

i+ 1β

(i+1)1 + ζi by (4.25)

=1

i+ 1γ

(i+1)1 + ζi since γ

(i+1)1 = 1 · β(i+1)

1

= γ(i)1 − γ

(i)0 + ζi by applying (γ-up) to γ

(i+1)1

= γ(i)1 − 1 + ζi by applying (γ-base) to γ

(i+1)0

= i(ζi − ζi−1)− 1 + ζi by applying (4.25) to γ(i)1

= (i+ 1)ζi − iζi−1 − 1

For the base cases, (4.24) gives ζ0 = 0 and

ζ1 = β(1)1 = γ

(1)1 = γ

(0)1 − γ(0)

0 = e− 1 ,

using (γ-up) and (γ-base). Putting everything together, we have:

ζ0 = 0 ζ1 = e− 1 ζi+1 = (i+ 1)ζi − iζi−1 − 1 .

Examining the recurrence (3.2) for α, we find that ζi = e · αi for all i ≥ 0 and so

g(S) =∑x∈U

ζ#(x,S)w(x) =∑x∈U

e · α#(x,S)w(x) = e ·G(S) .

Chapter 5

Set Systems for Local Search

In the previous chapters, we have presented local search algorithms for the problem of maxi-

mizing coverage and general monotone submodular functions subject to a matroid constraint.

In this section, we formulate a general class of set systems amenable to approximation by local

search algorithms. We begin by reviewing some existing generalizations of matroids, in Section

5.1. In Section 5.2, we present a new class of independence systems called weak k-exchange

systems and show that they capture problems in which a simple local search algorithm is a 1k -

approximation for linear maximization. In Section 5.3, we strengthen this definition to obtain

the class of strong k-exchange systems, which admit more sophisticated local search techniques.

We describe these techniques in detail in Chapter 6. We relate both of these systems to the

existing hierarchy of set systems presented in Section 5.1. Finally, in Section 5.4, we show that

several combinatorial optimization problems are k-exchange systems.

5.1 Set Systems for Greedy Approximation Algorithms

Before defining our classes of independence systems for local search algorithms, we review some

of the more general independence systems related to the greedy algorithm. Matroids capture

those independence systems in which the greedy algorithm always returns an optimal solution

for any linear function on the ground set [81, 33]. A natural question is whether there is a

similar characterization of systems in which the greedy algorithm yields a 1k -approximation for

some k > 1.

In order to address this question, Jenkyns [58] and Korte and Hausmann [66, 52] indepen-

dently introduced the class of k-systems. For any independence system (X, I) and any set

E ⊆ X, we let u(E) denote the size of the maximum base of I contained in E and l(e) denote

the size of the minimum base of I contained in E (recall that a set A ⊆ E is a base of E if

A+ x 6∈ I for all x ∈ E \A). Then, k-systems are defined as follows.

Definition 5.1 (k-system [58, 66, 52]). An independence system (X, I) is a k-system, for some

78

Chapter 5. Set Systems for Local Search 79

(not necessarily integral) k ≥ 1 if:

maxE⊆X

u(E)

l(E)≤ k .

If further

maxE⊆X

u(E)

l(E)= k ,

then (X, I) is called an exact k-system.

Definition 5.1 can be viewed as a natural generalization of the matroid characterization

presented in Theorem 2.11. Specifically, in the case that k = 1, Definition 5.1 states that

l(E) = u(E) for all sets E, and so all bases of E have the same size. This is precisely the

condition given in Theorem 2.11, and so the 1-systems are precisely those independence systems

that are matroids. In fact, the relationship between matroids and k-systems extends even

further: the intersection of k matroids is a k-system [58, 52]. This allows many optimization

problems that can be formulated as matroid intersection problems to be placed in the hierarchy

of k-systems.

Jenkyns, Korte, and Hausmann [58, 66, 52] show that the standard greedy algorithm Greedy

is a 1k -approximation for the problem of maximizing a linear function in a k-system. Moreover,

for every exact k-system, there exists a weight function for which the greedy algorithm returns

a solution of value only 1k times that of the optimal. Thus, k-systems are exactly those inde-

pendence systems for which the greedy algorithm is a 1k -approximation for linear maximization.

In further work, Hausmann and Korte [51] consider the k-greedy algorithm, which at each step

chooses a set of at most k elements that give the largest improvement in the objective function.

They show that there are k-systems for which this algorithm is still only a 1k -approximation.

Fisher, Nemhauser and Wolsey [43] examine the specific problem of maximizing a mono-

tone submodular function subject to the intersection of k matroid constraints. They show

that the greedy algorithm (which we presented as SubmodularGreedy in Algorithm 3) is a 1k+1 -

approximation for this problem, and that this bound is tight (i.e. there is a monotone sub-

modular function and k matroid constraints for which the greedy solution is only 1k+1 times

the optimal solution). They remark that this proof can be extended to the general class of

k-systems. A full proof of this claim appears in Calinescu et al. [21]. Finally, Gupta et al.

[49] show that it is possible to attain a k3(k+1)2

-approximation for non-monotone submodular

maximization in a k-independence system by combining multiple runs of the standard greedy

algorithm with an algorithm for unconstrained non-monotone submodular maximization.1

Unfortunately, k-independence can be difficult to establish for a given system, and may not

correspond to any useful algorithmic intuition. The class of k-extendible systems, introduced

by Mestre [74], accomplishes many of the same goals as k-systems but are defined by a more

direct, algorithmic property.

1Here, we have simplified the approximation ratio by assuming that the algorithm of Gupta et al. uses of arecent, tight 1

2-approximation algorithm of Feldman, Naor, and Schwartz [38] for unconstrained non-monotone

submodular maximization.


Definition 5.2 (k-extendible system [74]). An independence system (X, I) is k-extendible if

for any C ⊆ D ∈ I, for all x 6∈ C such that C + x ∈ I there exists a Y ⊆ D \ C with |Y | ≤ k

such that (D \ Y ) + x ∈ I.

Intuitively, the definition states that if we can add an element x to some independent set C

in a k-extendible system, then we can add x to any superset D of C, after removing at most k

elements from D \ C.

Mestre shows that the greedy algorithm is a 1k approximation for linear maximization in

any k-extendible system. Thus, the k-extendible systems are a subset of the k-systems. Fur-

thermore, as shown by Calinescu et al. [21], the greedy algorithm is a 1k+1 -approximation for

monotone submodular maximization in a k-extendible system. There are k-systems that are

not k-extendible [21], so k-extendible systems do not provide an exact characterization of those

systems for which these positive results hold. However, in contrast to k-systems, k-extendible

systems are defined by a local combinatorial property, which is often easier to establish and

can provide additional algorithmic insight into a problem. Mestre shows that a variety of

natural combinatorial optimization problems—including maximum asymmetric traveling sales-

man, b-matching, and job interval scheduling—are k-extendible for appropriate values of k (the

definitions of many of these problems can be found in Section 5.4).

The case k = 1 deserves special attention. Mestre shows that the 1-extendible systems are

exactly the matroids, and so in the case that k = 1 Definitions 2.10, 5.1, and 5.2 all define the

same class of independence systems.

5.2 Weak k-Exchange Systems

All of the results from 5.1 pertained to the standard greedy algorithms Greedy and Submodular-

Greedy. We now ask the same questions regarding a simple, “standard” local search procedure

(a, r)-OblLocalSearch, shown in Algorithm 9. The algorithm is an oblivious local search proce-

dure. It takes an independence system (X, I) and a function f : 2X → R≥0 and searches for a

set in I maximizing f .

More formally, suppose that S ∈ I is some feasible solution, A ⊆ X \ S is some set of

elements to add to S, and R ⊆ S is some set of elements to remove from S. Then, if |R| ≤ r,

|A| ≤ a and (S \R)∪A ∈ I, we say that (A,R) is a valid (a, r)-exchange for S.2 At each step,

(a, r)-OblLocalSearch searches for a valid (a, r)-exchange that improves the objective function

f , and terminates when no such exchange can be found.

For the rest of this section we focus on the special case in which a = 1, and r = k. Even

in this case, there are no known performance guarantees for Algorithm 9 for k-systems or even

2In the typical terminology related to matroids, this operation is actually a replacement, since it replaceselements in one set with elements from another. We use the terminology “exchange” in keeping with theterminology “exchange systems,” which were named after systems studied by Brualdi and Scrimger [19, 17, 18]which gave rise to strongly base orderable matroids, discussed in Section 2.3. We shall see that k-exchangesystems have a close relationship with this class of matroids.


Algorithm 9: (a, r)-OblLocalSearch

Input: Independence system (X, I), given as an independence oracleFunction f : 2X → R≥0, given as a value oracle

S ← ∅;repeat

foreach A ⊆ X \ S with |A| ≤ a doforeach R ⊆ S with |R| ≤ r do

if (S \R) ∪A ∈ I and f((S \R) ∪A) > f(S) thenS ← (S \R) ∪A;


k-extendible systems.3

In order to study the behavior of (1, k)-OblLocalSearch, we introduce the following class of

set systems, which we call weak k-exchange systems.

Definition 5.3 (Weak k-Exchange System). An independence system (X, I) is a weak k-

exchange system if, for all C and D in I, there exists a mapping Y assigning each element

x ∈ (C \D) a subset Y (x) of D \ C such that:

(WK1) |Y (x)| ≤ k for each x ∈ C \D.

(WK2) |Y −1(y)| ≤ k for each y ∈ D \ C.

(WK3) (D \ Y (x)) + x ∈ I for all x ∈ C \D.

(where Y −1(y) = x ∈ C \D : y ∈ Y (x)).

Intuitively, the definition mandates the existence of a collection of valid (1, k)-exchanges

between C and D, one for each element of C \D. This is already implicit in the definition of

k-extendible systems, but here we further require that no element of D \C appear in too many

of the resulting exchanges. Indeed, we can show that every k-exchange system is k-extendible.

Theorem 5.4. Let (X, I) be a weak k-exchange system. Then, (X, I) is k-extendible.

Proof. Suppose that C ⊆ D ∈ I, and C +x ∈ I for some x 6∈ C. Then, if x ∈ D, Definition 5.2

is trivially satisfied with Y = ∅. Thus, suppose that x 6∈ D, so x ∈ (C + x) \D. Then, let Y

be the set Y (x) from Definition 5.3. From properties (WK1) and (WK3) we have Y ⊆ C \D,

with |Y | ≤ k and (D \ Y ) + x ∈ I.

As in the case of 1-extendible systems, the class of weak 1-exchange systems behave specially.

3Fisher, Nemhauser and Wolsey [43] consider the more restricted algorithm (1, 1)-OblLocalSearch, and showthat it attains an approximation ratio of 1

2for submodular maximization in a single matroids, but no constant

factor for the intersection of 2 or more matroids.


Theorem 5.5. Let (X, I) be an independence system. Then, (X, I) is a matroid if and only

if it is a weak 1-exchange system.

Proof. Suppose that (X, I) is a weak 1-exchange system. We shall show that the condition of

Definition 2.10 must hold. Let C,D ∈ I, with |C| > |D|. If D ⊂ C, then for any element

x ∈ C \D we have D + x ∈ I and Definition 2.10 is satisfied. Thus, suppose that D 6⊂ C and

consider the mapping Y from C \ D to subsets of D \ C from Definition 5.3. From (WK2),

we have |Y −1(y)| ≤ 1 for all t ∈ D \ C, and so the number of elements x ∈ C \ D for which

Y (x) 6= ∅ is at most

|D \ C| = |D| − |D ∩ C| = |D| − (|C| − |C \D|) = |D| − |C|+ |C \D| < |C \D| ,

where the last inequality follows from |C| > |D|. Hence, there must be at least one element

x ∈ C \D with Y (x) = ∅, and from (WK3) we have (D \ Y (x)) + x = D + x ∈ I.

Conversely, suppose that (X, I) is a matroid and let C,D ∈ I. We extend both C and D

to bases C and D by adding arbitrary elements from X to each of them. Then, from Theorem

2.13 there must be a bijection π from C to D such that D− π(x) + x ∈ I for all x ∈ C. Define

Y (x) = π(x) ∩ D. Then, since π(x) = x for any x ∈ C ∩ D, we have π(x) ∈ D \ C for all

x ∈ C \ D and so Y (x) ⊆ D \ C ⊆ D \ C for all x ∈ C \ D, as required. Clearly |Y (x)| ≤ 1

for all x ∈ C. Moreover, because π is a bijection, every element of D \ C appears in Y (x) for

at most one x ∈ C \ D. Thus, properties (WK1) and (WK2) of Definition 5.3 are satisfied.

Finally, consider the element π(x) for x ∈ C \D. If π(x) ∈ D \D then we have Y (x) = ∅ and

(D \ Y (x)) + x = D + x ⊆ D − π(x) + x ∈ I .

If π(x) ∈ D then we have

(D \ Y (x)) + x ⊆ (D \ Y (x)) + x = D − π(x) + x ∈ I .

In either case, Y satisfies property (WK3) of Definition 5.3.

We now state our main result, which relates the locality ratio of (1, k)-OblLocalSearch to

the notion of weak k-exchange systems.

Theorem 5.6. Let (X, I) be a weak k-exchange system and f : 2X → R≥0 be a function.

Then the locality ratio of (1, k)-OblLocalSearch on the instance (X, I), f is at least 1k+1 if f is

monotone submodular and at least 1k if f is linear.

Proof. Suppose that f is monotone submodular. Let S ∈ I be the locally optimal solution

produced by (1, k)-OblLocalSearch on the given instance and let O ∈ I be an optimal solution

for this instance. Consider the mapping Y assigning each element x ∈ O \ S a subset Y (x) of

S \O. From (WK1) and (WK3) we have |Y (x)| ≤ k and (S \ Y (x)) + x ∈ I for all x ∈ O \ S.


Thus, for all x ∈ O \ S, the set (x, Y (x)) is a valid (1, k)-exchange for S. Because S is locally

optimal for all such exchanges, we have:

f((S \ Y (x)) + x) ≤ f(S) ,

for all x ∈ O \ S. Subtracting f(S \ Y (x)) from each side, we obtain

f((S \ Y (x)) + x)− f(S \ Y (x)) ≤ f(S)− f(S \ Y (x)) .

Applying submodularity on the left, we have

f(S + x)− f(S) ≤ f((S \ Y (x)) + x)− f(S \ Y (x)) ≤ f(S)− f(S \ Y (x)) , (5.1)

for each x ∈ O \ S. Summing (5.1) over all such x gives∑x∈O\S

f(S + x)− f(S) ≤∑

x∈O\S

f(S)− f(S \ Y (x)). (5.2)

Each element of x ∈ O \S occurs in exactly one of the sets on the left of (5.2). Thus, Theorem

2.6 gives ∑x∈O\S

f(S + x)− f(S) ≥ f(S ∪ (O \ S))− f(S) = f(S ∪O)− f(S) (5.3)

By property (WK2) of Y , each element of S \ O appears in at most k of the sets Y (x) on the

right. Thus, Theorem 2.7 gives∑x∈O\S

f(S)− f(S \ Y (x)) ≤ k [f(S)− f(S ∩O)] (5.4)

Combining (5.2), (5.3), and (5.4), we obtain

f(S ∪O)− f(S) ≤ k [f(S)− f(S ∩O)] ,

which is equivalent to

f(S ∪O) + kf(S ∩O) ≤ (k + 1)f(S)

From the monotonicity of f , we have f(O) ≤ f(S ∪ O). Furthermore, we have f(S ∩ O) ≥ 0,

from non-negativity of f . Thus, for any monotone submodular function f , we have

f(O) ≤ f(S ∪O) + kf(S ∩O) ≤ (k + 1)f(S) .

If in addition to being monotone submodular, f is in fact linear, then we have

f(S) + f(O) = f(S ∪O) + f(S ∩O) ≤ f(S ∪O) + kf(S ∩O) ≤ (k + 1)f(S) ,


since k ≥ 1. Hence,

f(O) ≤ kf(S)

5.3 Strong k-Exchange Systems

Theorem 5.6 shows that the oblivious local search algorithm (1, k)-OblLocalSearch attains a k-

approximation for linear maximization on any weak k-exchange system. However, Theorem 5.4

shows that every such system must also be k-extendible, and so the standard greedy algorithm

Greedy also provides a k-approximation. This seems to suggest that local search algorithms

for k-extendible set systems perform no better than their greedy counterparts, which are often

much faster. However, this is the not the case.

In fact, the best known approximation algorithms for several k-extendible problems are

based on local search. Examples of k-extendible problems for which local search algorithms

attain an approximation ratios better than 1k in the linear case or 1

k+1 in the monotone sub-

modular case include: matroid matching [70, 87], maximum independent set in (k + 1)-claw

free graphs and [55, 50, 22, 11, 12], and intersection of k matroids [71]. All of the local search

algorithms used in these cases consider exchanges larger than the (1, k)-exchanges used by

(1, k)-OblLocalSearch. With this in mind, we introduce the following strengthening of weak

k-exchange systems.

Definition 5.7 (strong k-exchange system). Let (X, I) be a weak k-exchange system. Then,

(X, I) is a strong k-exchange system if for all C,D ∈ I the mapping Y from Definition 5.3

additionally satisfies:

(SK3) (D \ Y (C ′)) ∪ C ′ ∈ I for all C ′ ⊆ C \D.

(where Y (C ′) =⋃x∈C′ Y (x)).

The main difference between weak and strong k-exchange systems is that in strong k-

exchange systems we can perform any number of (1, k)-exchanges (x, Y (x)) simultaneously.

This allows us to easily develop algorithms that consider significantly larger neighborhoods

than (1, k)-OblLocalSearch.

As in the case of weak 1-exchange systems, we can show that the strong 1-exchange systems

are equivalent to a known class of set systems: the class of strongly base orderable matroids,

discussed in Section 2.3.

Theorem 5.8. Let (X, I) be an independence system. Then, (X, I) is a strongly base orderable

matroid if and only if it is a strong 1-exchange system.

Proof. Let (X, I) be a strong 1-exchange system. Then, by Theorem 5.5, (X, I) is a matroid.

Let C,D be two bases of I, and let Y be the mapping from C \ D to subsets of D \ C from


Definition 5.7. Each element of D \ C appears in at most 1 of the sets Y (x) for x ∈ C \ D.

Furthermore, since (X, I) is a matroid we must have |C| = |D| and so

|C \D| = |C| − |C ∩D| = |D| − |C ∩D| = |D \ C| .

Thus, we must in fact have each element of D \ C in exactly one set Y (x) for x ∈ C \ D.

We construct a bijection π : C → D by setting π(x) to be the single element in Y (x) for all

x ∈ C \D and letting π(x) = x for all x ∈ C ∩D. Then, for any C ′ ⊆ C we have

(D \ π(C ′)) ∪ C ′ =(D \ Y (C ′\D)

)∪ (C ′\D) ∈ I ,

from (SK3), since C ′ \D ⊆ C \D. Thus, (X, I) is strongly base orderable.

Conversely suppose that (X, I) be a strongly base orderable matroid. Let C and D be sets

in I and extend them to bases C and D by adding some arbitrary elements to them. Because

(X, I) is strongly base orderable there exists a bijection π : C → D such that (D\π(C ′))∪C ′ ∈ Ifor all C ′ ⊆ C. As in Theorem 5.5, set Y (x) = π(x) ∩ (D \ C) for all x ∈ C \D. Then, as

shown in Theorem 5.5, Y satisfies (WK1) and (WK2). Additionally, for any C ′ ⊆ C \ D we

have

D \ Y (C ′) = (D \ Y (C ′)) ∩ (C ∪D)

=[D \ (π(C ′) ∩D)

]∩ (C ∪D)

=[(D \ π(C ′)) ∪ (D \D)

]∩ (C ∪D)

= (D \ π(C ′)) ∩ (C ∪D) ⊆ D \ π(C ′)

So, we have (D \ Y (C ′)) ∪ C ′ ⊆ (D \ π(C ′)) ∪ C ′ ∈ I. Thus (SK3) is satisfied and so (X, I) is

a strong k-exchange system.

Finally, we note that the classes of strong and weak k-exchange systems are closed under

contraction.

Theorem 5.9. Let (X, I) be a weak k-exchange system and define Ix = Y ⊆ X −x : Y +x ∈I. Then, (X − x, Ix) is a weak k-exchange system. Furthermore, if (X, I) is a strong k-

exchange system, then so is (X − x, Ix).

Proof. Consider two sets C ′, D′ ∈ Ix. Then, we must have C ′+x and D′+x in I. Let Y be the

mapping for sets C = C ′ + x and D = D′ + x in Definition 5.3. Then, Y assigns each element

of C \D a subset of D \C and satisfies (WK1), (WK2), and (WK3). But, C \D = C ′ \D′ and

D \ C = D′ \ C ′ so Y in fact assigns each element of C ′ \D′ a subset of D′ \D′. Set Yx = Y .

Then, since Y satisfies (WK1), (WK2), and (WK3) with respect to C and D, Yx satisfies

(WK1), (WK2), and (WK3) with respect C ′ and D′. Thus, (X − x, Ix is a weak k-exchange

system. Moreover, if (X, I) is a strong k-exchange system then the mapping Y additionally


satisfies (SK3) for C,D and hence Yx satisfies (SK3) for C ′, D′. Thus, in this case (X − x, Ix)

is also a strong k-exchange system.

We defer our discussion of algorithms for strong k-exchange systems to the next chapter,

where we give algorithms for both linear and submodular maximization.

5.4 Applications

Now we give some examples of combinatorial optimization problems that are naturally express-

ible as k-exchange systems. For each problem we give a cursory overview of related results,

including a discussion of the best known approximation algorithms, as well as a theorem plac-

ing the problem in the hierarchy or k-exchange systems. We shall not give explicit proofs that

the set systems discussed are independence systems, when it is reasonably obvious or follows

directly from the nature of underlying systems. Our theory of k-exchange system systems was

inspired largely by connections between the independent set problem in (k+1)-claw free graphs

and the matroid k-parity problem. Thus, we spend a bit more time on the discussion of these

particular applications.

5.4.1 Independent Set in (k + 1)-Claw Free Graphs

A graph G is (k + 1)-claw free if there are at most k independent vertices in the neighborhood

of any vertex v of G. The class of such graphs, includes many classes of intersection graphs

(e.g. intersection graphs of unit intervals and unit discs). Additionally, many problems can be

reduced to the problem of finding an independent set in a (k+1)-claw free graph, including the

unit job interval scheduling problem (k = 3), k-set packing, and k-dimensional matching. Here

we study the problem of finding an independent set of vertices in G that maximizes a given

function f : 2V → R≥0.

Hazan, Safra, and Schwartz [53] consider the special case of unweighted k-set packing (here,

f(S) is simply |S|). They show that for k ≥ 3, unweighted k-set packing cannot be ap-

proximated to within a factor of Ω(ln k/k) unless P = NP. Hurkens and Schrijver [55] give

a 2k+ε -approximation for unweighted k-set packing as well as the general problem of finding

a maximum cardinality independent set in a (k + 1)-claw free graphs. Their algorithm is a

straightforward oblivious local search algorithm. Halldorsson [50] shows that a simplified local

search algorithm with a smaller neighborhood structure attains the same result for unweighted

independent set in (k+ 1)-claw free graphs, and analyzes its performance for a variety of other

problems.

In the weighted case (i.e. the case in which f is linear), the greedy algorithm yields a k-

approximation, since the independent sets in a (k+ 1)-claw free graph form a k-system. Arkin

and Hassin [5] considered the weighted k-set packing problem, and show that (t, t−1)-OblLocal-

Search has a locality ratio of only 1k−1+ 1

t

. Chandra and Halldorsson showed that it is possible


to modify the oblivious local search algorithm to approximate the problem beyond the locality

ratio, attaining a(

32(k+1) − ε

)-approximation even for the general problem of maximum weight

independent set in (k + 1)-claw free graphs. Their oblivious local search algorithm starts from

a greedy solution and chooses the best available (k, k2)-exchange at each step. By using a

non-oblivious local search algorithm that seeks to maximize the squared weight of the current

independent set, Berman [11] attained(

2k+1 − ε

)-approximation. Later, Berman and Krysta

[12] gave a modified version of this non-oblivious local search algorithm with an approximation

ratio of(

32k − ε

)whose runtime is independent of k.

The algorithms that we present in the next chapter extend Berman’s non-oblivious approach

to all strong k-exchange systems, and further generalize it to the case of monotone submodular

objective functions. We now show that the independent sets in a (k + 1)-claw free graph form

a strong k-exchange system.

Theorem 5.10. Let G = (V,E) be a (k + 1)-claw free graph, and I be the set of independent

sets of vertices in G. Then (V, I) is a strong k-exchange system.

Proof. Let C and D be 2 independent sets of vertices in G. For each vertex x ∈ C \D, let Y (x)

be the set of all vertices in D adjacent to x. Then, since C and D are independent we must

in fact have Y (x) ⊆ D \ C as required. Because G is (k + 1)-claw free and all the vertices in

D \C are independent, we must have |Y (x)| ≤ k. Thus Y satisfies (WK1). A vertex y ∈ D \Cappears in Y (x) for some x ∈ C \ D if and only if it is adjacent to x. Then, because G is

(k + 1)-claw free and all the vertices of C \ D are independent, we must have |Y −1(y)| ≤ k.

Thus Y satisfies (WK2). Finally, for any C ′ ⊆ C \ D, the set D \ Y (C ′) contains no vertices

adjacent to C ′. Thus, (D \ Y (C ′)) ∪ C ′ is an independent set, and so Y satisfies (SK3).

The class of strong k-exchange systems has a close relationship to (k+ 1)-claw free graphs.

In fact, the strong k-exchange systems can be defined succinctly in graph-theoretic terms. The

class of (k + 1)-claw free graphs has the property that, for any 2 independent sets C,D, the

degree of the bipartite subgraph induced by C ∪ D is at most k. Similarly, an independence

system (X, I) is a strong k-exchange system if and only if for each C,D ∈ I we can construct a

bipartite graph GC,D on the vertex set C ∪D such that every vertex has degree at most k and

every independent set in GC,D is in I. The mapping Y (x) is simply the standard neighborhood

relation in GC,D, and conditions (WK1) and (WK2). In this sense, the strong k-exchange

systems behave locally like (k+1)-claw free graphs, in the sense that for each individual pair of

independent sets C,D, we can construct a (k + 1)-claw free graph that encodes independence

on C ∪D.

5.4.2 k-Matroid Intersection

In the k-matroid intersection problem, we are given k matroids M1, . . . ,Mk, each defined

on a common ground set X, together with a function f : 2X → R≥0. The goal is to find

a set S ⊆ X that maximizes f subject to the constraint that S is an independent set in


each of the k matroids. As discussed in Section 2.3, the intersection of k matroids forms a

k-system. Thus, the greedy algorithm gives a 1k -approximation for linear functions f and a

1k+1 -approximation monotone submodular functions. Lee, et al. [69] give an improved 1

k+ε for

monotone submodular maximization in the specific case of partition matroids via a local search

algorithm. Lee, Sviridenko, and Vondrak [71] generalize this result, giving a general oblivious

local search algorithm that is a 1k−1+ε -approximation for maximizing linear functions subject

to arbitrary matroid constraints and a 1k+ε -approximation for monotone submodular functions.

For k = 2, the problem of maximizing a linear function f can be solved in polynomial

time [68, 34]. Because the k-dimensional matching problem can be represented as k-matroid

intersection, the Ω( ln kk )-inapproximability result of Hazan, Safra, and Schwartz [53] applies

here, as well, for all k ≥ 3. In the case of a monotone submodular objective function, the

problem is NP-hard even in the case of a single matroid, as we discussed in Chapter 3.

We now show that the intersection of k matroids is a weak k-exchange system for general

matroids, and a strong k-exchange system when all of the k matroids are strongly base orderable.

This, together with the algorithmic results in the next chapter, allow us to give improved

approximations for k-matroid intersection with both linear and monotone submodular objective

functions, in the special case that all the matroids are strongly base orderable.

Theorem 5.11. Let M1 = (X, I1), . . . ,Mk = (X, Ik) be k matroids on the ground set X and

let I =⋃ki=1 Ik. Then, (X, I) is a weak k-exchange system. If each ofM1, . . . ,Mk are strongly

base orderable, then (X, I) is strong k-exchange system.

Proof. Let C,D ∈ I be two sets that are independent in each of M1, . . . ,Mk. From Theorem

5.5, each matroid Mi is a weak 1-exchange system. Let Yi be the mapping assigning each

element of C → D a subset of D \ C for the matroid Mi, and let Y (x) =⋃ki=1 Yi(x). Then,

|Y (x)| ≤k∑i=1

|Yi(x)| ≤ k

for all x ∈ C, where the last inequality follows from the fact that each Yi satisfies (WK1) with

k = 1. Similarly, we have Y −1(y) =⋃ki=1 Y

−1i (y), and so

|Y −1(y)| ≤k∑i=1

|Y −1i (y)| ≤ k

for all y ∈ D \ C, where the last inequality follows from the fact that each Yi satisfies (WK2).

Thus, Y satisfies (WK1) and (WK2) with k = 1. Finally, we note that for any x ∈ C,

(D \ Y (x)) + x ⊆ (D \ Yi(x)) + x ∈ Ii

for each i ∈ [k], where the final statement holds because Yi satisfies (WK3). Thus, for any

x ∈ C \D, we have D \Y (x)+x ∈ Ii for each matroidMi and so Y satisfies (WK3) is satisfied.


If each of M1, . . . ,Mk are strongly base orderable, then from Theorem 5.8 each Mi is a

strong k-exchange system. Then we note that for any C ′ ⊆ D we have

(D \ Y (C ′)) ∪ C ′ ⊆ (D \ Yi(C ′)) ∪ C ′ ∈ Ii

for all i ∈ [k], where the final statement holds because Yi satisfies (SK3). Thus, for any

C ′ ⊆ C \D, we have (D \ Y (C ′)) ∪ C ′ ∈ Ii for each matroid Mi and so Y satisfies (SK3).

5.4.3 k-Uniform Hypergraph b-Matching

In the k-uniform hypergraph b-matching problem, we are given a k-uniform4 hypergraph H =

(V, E) and a budget b ∈ N, together with a function f : 2E → R≥0. The goal is to choose a

set of hyperedges S ⊆ E that maximizes f subject to the constraint that each vertex v ∈ Vis contained in at most b of the hyperedges in S. We consider here a generalization in which

each vertex v ∈ V has its own budget b(v) ∈ N. In the case that b(v) = 1 for all vertices, we

obtain the hypergraph matching problem, which is equivalent to k-set packing problem. Thus,

while the standard b-matching problem, corresponding to k = 2 is solvable in polynomial time,

for k ≥ 3 the problem is NP-hard and, moreover, the hardness of approximation results [53] for

k-set packing apply. El Ouali, Fretwurst, and Srivastav generalize this result to show that in

the case that b(v) = b for all vertices v, it is NP-hard to approximate k-uniform hypergraph

b-matching to within a factor of Ω(b log kk

).

Theorem 5.12. Let I be the set of all b-matchings in a k-uniform hypergraph H = (V, E).

Then, (E ,I) is a strong k-exchange system.

Proof. This proof is heavily indebted to Moran Feldman, Seffi Naor, and Roy Schwartz’s proof

that b-matching is a 2-exchange system. Consider 2 b matchings C and D in I. For each

vertex v ∈ V , let δC(v) and δD(v) be the sets of hyperedges from C \ D and D \ C, respectively,

containing v. For each vertex v ∈ V , we number the hyperedges in δC(v) and δD(v) arbitrarily,

and denote by νC(v,E) and νD(v,E) the label in [b(v)] given to a hyperedge E at vertex v in

C and D, respectively (note that a hyperedge may receive a different label at each one of its

vertices). Then, for each hyperedge E ∈ C \ D, we set

Y (E) = E′ ∈ D \ C : ∃v ∈ (E′ ∩ E) with νC(v,E) = νD(v,E′) .

That is, Y (E) contains those hyperedges in D \ C that share a vertex with E and have the

same label as E at this vertex. For each vertex v of a hyperedge E ∈ C \ D there is at most 1

hyperedge E′ ∈ D \ C that has νD(v,E′) = νC(v,E). Thus, |Y (E)| ≤ k for all E ∈ C \ D and

so (WK1) is satisfied. Similarly, for each vertex v of a hyperedge E′ ∈ D \ C there is at most 1

4The constraint that each hyperedge have exactly k vertices can easily be relaxed to a constraint that eachhyperedge have at most k vertices by adding “dummy” vertices to the graph.


hyperedge E ∈ C \ D that has νC(v,E) = νD(v,E′). Thus, Y −1(E′) ≤ k for all E′ ∈ D \ C and

so (WK2) is satisfied.

Finally, let C′ ⊆ C \ D and consider the set of hyperedges D′ = (D \ Y (C′)) ∪ C′. For each

vertex v ∈ V and each E ∈ D′, let νD′(v,E) = νD(v,E) if E ∈ D and νD′(v,E) = νC(v,E) if

E ∈ D \ C. The construction of Y ensures that for any vertex v ∈ V , all hyperedges E of D′

that contain v must have distinct labels νD′(v,E). For each vertex v ∈ V there are only b(v)

distinct labels νD′(v,E) and so the number of hyperedges in D′ incident to each vertex v is at

most b(v). Thus, D′ must be a valid b-matching and so (SK3) is satisfied.

5.4.4 Matroid k-Parity

In the matroid k-parity problem, we are given a matroid M = (X, I) and a set E of pairwise

disjoint k-element subsets, together with a function f : 2E → R≥0. The goal is to find a set

S ⊆ E that maximizes f , subject to the constraint that the elements covered by the sets in Sis in I. A related problem is the matroid k-matching problem. In this problem, we are given

M together with a hypergraph H = (X, E) and a function f : 2E → R≥0 and the goal is to

find a matching S in H that maximizes f , again subject to the constraint that the of elements

covered by the edges in S is in I. The matroid k-parity problem corresponds to the special

case in which all the edges in H are disjoint. Matroid matching in a k-regular hypergraph is

reducible to matroid k-parity as well, and thus the two problems are equivalent [70].

The matroid k-parity problem can be viewed as a common generalization of k-matroid

intersection and the k-uniform hypergraph b-matching problem, which we have just considered.

Unlike these problems, however, matroid k-parity is NP-hard even in the case k = 2. If the

matroid M is given by an independence oracle, there are instances of the matroid 2-matching

problem (and hence also matroid 2-parity) for which obtaining an optimal solution requires

an exponential number of oracle calls, even in the unweighted case in which f is a cardinality

function [59]. These instances can be modified to show that matroid 2-parity and matroid

2-matching are NP-hard, via a reduction from the maximum clique problem [85].

In the special case of linear matroids, Lovasz [72, 73] obtains a polynomial time algorithm

for maximum cardinality matroid 2-matching. Lee, Sviridenko, and Vondrak [70] give a PTAS

for maximum cardinality matroid 2-parity in arbitrary matroids, and a k/2 + ε approximation

for matroid k-parity in arbitrary matroids.

In the weighted case (in which f is a linear function), the greedy algorithm provides a

k-approximation, since both the matching and parity problems are examples of k-systems.

Although this remains the best known result for general matroids, some improvement has been

made in the case of k = 2 for restricted classes of matroids. Tong, Lawler, and Vazirani give

an exact polynomial time algorithm for weighted matroid 2-parity in gammoids [90]. Soto [87]

extends this result, obtaining a PTAS for all strongly base orderable matroids. Soto further

shows that weighted matroid 2-matching remains NP-hard even in this restricted case.


For any set P of subsets of X, let

P =⋃P∈P

P

denote the set of all elements contained in some set of P. Then, we have the following theorem,

which relates the matroid k-parity problem to k-exchange systems.

Theorem 5.13. Let (X, IM) be a strongly base orderable matroid and let E be a set of disjoint

k-element subsets of X. Further let I be the set of all subsets S ⊆ E that satisfy S ∈ IM.

Then, (X, IM) is a strong k-exchange system.

Proof. Consider C,D ∈ I. From Theorem 5.8 (X, IM) is a strong 1-exchange system and so

there is a mapping YM assigning each element in C \D a set of elements in D \C and satisfying

(WK1), (WK2), and (SK3). For each k-set E ∈ C \ D we define

Y (E) = E′ ∈ D \ C : E′ ∩ YM(E) 6= ∅ .

Because no two sets in E share an element, there are at most |YM(E)| distinct sets E′ ∈ D\Csuch that E′ ∩ YM(E) 6= ∅. We have

|Y (E) ≤ |YM(E)| ≤∑e∈E|YM(e)| ≤ k ,

where the final inequality follows from the fact that YM satisfies (WK1) with k = 1. Thus, Y

satisfies (WK1).

Similarly, there at most |Y −1M (E′)| distinct sets E ∈ C \D such that E′ ∩ YM(E) 6= set. We

have:

Y −1(E′) ≤ |Y −1M (E′)| ≤

∑e′∈E′

|Y −1M (e′)| ≤ k ,

where the final inequality follows from the fact that YM satisfies (WK2) with k = 1. Thus, Y

satisfies (WK2).

Finally, for any C′ ⊆ C \ D we have

(D \ Y (C′)) ∪ C′ = D \ Y (C′) ∪ C′ ⊆ (D \ YM(C′)) ∪ C′ .

From property (SK3) of YM, the last set must be in IM. Thus, D \ Y (C′) ∪ C ∈ I and so Y

satisfies (SK3).

We briefly note that even in the case that the given matroid is not strongly base orderable,

matroid k-parity gives rise to a weak k-exchange system. The proof requires a stronger version

Theorem 2.13 in which the bijection π is between the sets of a partition of each of the 2 bases.

This stronger exchange property is given by Greene and Magnanti [48, Theorem 3.3].


5.4.5 Maximum Asymmetric Traveling Salesman

In the maximum asymmetric traveling salesman problem (MaxATSP) we are given a directed

graph G = (V,E), together with a function f : 2E → R≥0. We seek a directed Hamiltonian

cycle in G that maximizes f .

In the standard variant of the problem, f is a linear function given as a weight for each edge.

There has been a great deal of recent work on both the general case and metric case, in which the

weights on E must satisfy the triangle inequality. We do not attempt to give a complete history

of the problem here, but rather state only the current best results, for the sake of comparison.

In the general setting, the best known approximation algorithm is a 23 -approximation by Kaplan

et al. [62]. In the metric case, Kowalik and Mucha [67] give a 78 -approximation algorithm. The

problem is APX-hard even in the metric case (this follows from a result by Papadmitriou and

Yannakakis [80] for the minimization version of the traveling salesman problem with all weights

1 or 2), and Karpinsky and Schmied show that in the general case it is NP-hard to attain any

approximation beyond 206207 .

We now show that MaxATSP gives rise to a 3-exchange system. Although the algorithms we

present in the next chapter do not attain the best known approximation results for MaxATSP,

the proof of Theorem 5.14 is of some interest in its own right. The independence system

associated with MaxATSP contains all Hamiltonian cycles in G as well as their subsets (the

inclusion of the latter ensures that the system is indeed an independence system). A set of

edges S is independent in this system if and only if every vertex in the subgraph (V, S) has

indegree and outdegree at most 1 and S contains no cycle that is not a Hamiltonian cycle. We

can represent these constraints as the intersection of 3 matroids: one partition matroid enforcing

the indegree constraint, one partition matroid enforcing the outdegree constraint, and a graphic

matroid enforcing the constraint that no non-Hamiltonian cycles are present. Surprisingly, even

though this last matroid is not necessarily strongly base orderable, the intersection of all three

matroids forms a strong k-exchange system, as we now show.

Theorem 5.14. Let G = (V,E) be a directed graph and let I be the set of all Hamiltonian

cycles in G, together with all of their subsets. Then, (E, I) is a strong 3-exchange system.

Proof. We present the following proof based on a proof by Feldman, Naor, and Schwartz.

Suppose that C,D ∈ I. For every edge e = (h, t) ∈ C \D the set Y (e) ⊆ D \ C comprises the

following three types of edges in D \C: (1) edges of the form (h, x) for some vertex x; (2) edges

of the form (x, t) for some vertex x (3) the first edge of D \C encountered on any path from t

containing only edges of D.

For each e ∈ C \D there can be at most 1 edge of each type. For edges of type (1) and (2),

this follows directly from the fact that the indegree and outdegree of D is at most 1. This fact

also implies that there can be at most 1 path from t containing only edges of D. Thus, there

can be at most 1 edge of type (3). Thus, (WK1) holds.

Next, we show that edge e′ ∈ D \ C can appear as an edge of each type only once, and so


(WK2) holds. For type (1) and (2) this follows directly from the fact that the indegree and

outdegree of C is at most 1. For type (3), we again note that there is at most 1 path P in D

containing the edge e′. Consider any 2 edges e1 = (h1, t1) and e2 = (h2, t2) in C \D. We show

that e′ can appear as an edge of type (3) for at most 1 of these edges. Since the outdegree of

all vertices in C is at most 1, we must have t1 6= t2. Suppose without loss of generality that t1

comes before t2 on the path P , and consider the edge (x, t2) in P . Because e2 6∈ D, we must

have x 6= h2. Because the indegree of all edges in C is at most 1, we cannot have (x, h2) ∈ C,

and so (x, h2) ∈ D \C. But, we encounter (x, h2) before e′ when traversing the path P starting

at t1. Thus, we can have e′ as an edge of type (3) only for t2.

Finally, we show that (SK3) holds. Let C ′ ⊆ C \D and D′ = (D \ Y (C ′)) ∪ C ′. We must

show that D′ ∈ I. For every e = (h, t) ∈ C ′, Y (C ′) contains all edges from D the form (h, x) or

(x, t). Thus, all such vertices h and t have outdegree and indegree 0, respectively, in D \Y (C ′),

and so have outdegree and indegree 1 in D′. All other vertices have the same indegree and

outdegree in D′ as in D. It remains to show that D′ does not contain any non-Hamiltonian

cycles. Suppose, for the sake of contradiction, that D′ does contain a non-Hamiltonian cycle.

Because both C and D do not contain any non-Hamiltonian cycles, this cycle must contain at

least one edge e from C \D and one edge e′ from D \ C. Suppose that e and e′ are two such

edges and that the only edges between e and e′ on the cycle are from D. Such a pair of edges

must exist (note that the set of edges between e to e′ may be empty in the case that e and e′

are adjacent). Then, we must have e′ ∈ Y (e) of type (3), and so e′ cannot be in D′. Thus, D′

has indegree and outdegree 1 and cannot contain any non-Hamiltonian cycles, so we must have

D′ ∈ I.

Chapter 6

Algorithms for Strong k-Exchange

Systems

In the previous chapter, we defined the classes of weak and strong k-exchange systems, and

showed how several combinatorial optimization problems fit into these classes. Our main mo-

tivation for introducing the class of strong k-exchange systems was the observation that, while

the locality ratio of the simple algorithm (1, k)-OblLocalSearch is no better than the approxima-

tion ratio of the greedy algorithm for all weak k-exchange system, more complex local search

algorithms have given improved results for many of the problems discussed in Section 5.4. The

result of Arkin and Hassin [5] in the specific case of k-set packing shows that even if we consider

(t, t− 1)-exchanges, the oblivious local search algorithm has a locality ratio of only 1k−1+ 1

t

. In

contrast, Berman [11] shows that a simple, non-oblivious algorithm attains a locality ratio of2

k+1 by considering only (k, k2 − k + 1)-exchanges. We now generalize Berman’s result, giving

a non-oblivious local search algorithm for both linear and monotone submodular maximization

in any strong k-exchange system. The algorithms run in deterministic polynomial time and

are(

2k+1 − ε

)- and

(2

k+3 − ε)

-approximations for linear and monotone submodular maximiza-

tion, respectively. By using the partial enumeration technique described in Section 2.6, we can

remove the extra factor of ε from these ratios, obtaining a clean 2k+1 - and 2

k+3 -approximation.

The resulting algorithm gives improved approximation results for several problems. We

summarize these results in Table 6. The 1k -approximation for strongly base orderable matroid

k-parity is not stated explicitly by Jenkyns [58] (who gives results for the related k-matchoid

problem), but follows directly from the fact that matroid k-parity is a k-system. Similarly,

the monotone submodular maximization results for (k + 1)-claw free graphs and strongly base

orderable matroid k-parity follow directly from Fisher, Nemhauser, and Wolsey’s [43] work on

submodular maximization in k-systems, even though they do not explicitly show that these

problems may be formulated as k-systems. Finally, the previous results for maximum asym-

metric traveling salesman follow from the fact that the set of (partial) directed Hamiltonian

cycles can be represented as the intersection of 3 matroids, as noted in Section 5.4.5. In this

94

Chapter 6. Algorithms for Strong k-Exchange Systems 95

case, while the algorithm of Lee, Sviridenko, and Vondrak [71] is a 13+ε -approximation for any

ε, its runtime is exponential in ε−1. Thus, the improvement to 13 is more significant than it first

appears.

In independent work (presented jointly with this work in [39]), Feldman, Naor, and Schwartz

give improved approximations for non-monotone submodular maximization in strong k-exchange

systems. Their approach is based on an oblivious local search algorithm similar to that employed

by Lee, Sviridenko, and Vondrak [71] in the case of k-matroid intersection. The algorithm is

a k−1k2+ε

-approximation, and gives improved approximations for many other specific problems

as well. Like the algorithm of Lee, Sviridenko, and Vondrak, their algorithm has exponential

dependence on ε−1.

Table 6.1: Approximation Ratios for k-Exchange Systems

Problem Objective* Previous Result Our Result

Indep. Set in (k + 1)-Claw Free Graphs MS 1k+1 [43] 2

k+3

SBO Matroid k-Intersection L 1k−1+ε [71] 2

k+1

MS 1k+ε [71] 2

k+3

SBO Matroid k-Parity L 1k [58] 2

k+1

MS 1k+1 [43] 2

k+3

Maximum ATSP MS 13+ε [71] 1

3

* L : linear, MS : monotone submodular

6.1 Linear Maximization

In this section, we consider the problem of maximizing a linear function f in a strong k-exchange

system. We show that the non-oblivious local search algorithm introduced by Berman [11] for

(k+ 1)-claw free graphs in fact applies to any strong k-exchange system. This is not surprising

given our observation (in Section 5.4.1) that strong k-exchange systems behave like (k + 1)-

claw free graphs for every pair independent sets. Nonetheless, we present here a full analysis

of the linear case, as we shall need some of the ideas and intuition from it for the monotone

submodular case, which we consider in the next section. Throughout our analysis, we assume

that the linear function f has been given in the form of a weight function w, assigning each

element x ∈ X a non-negative weight w(x). We measure the complexity of algorithms in terms

of the parameter k, and n = |X|.The non-oblivious local search algorithm Linear-k-Exchange is shown in Algorithm 10. It

considers a larger neighborhood than (1, k)-OblLocalSearch, at each step searching through all

possible valid (k, r(k))-exchanges, where r(k) = k2 − k + 1. Our analysis will make crucial

use of the fact that, since we are in a strong k-exchange system, that we can build a valid

(k, r(k))-exchange by combining several overlapping valid (1, k)-exchanges. Additionally, Linear-


k-Exchange uses the non-oblivious potential function

w2(S) =∑x∈S

w(x)2 ,

obtained by squaring all of the weights given to the algorithm. Note that w2 is a linear function,

so the standard local improvement condition

w2((S \B) ∪A) > w2(S)

used to the guide the search is equivalent to the condition

w2(A) > w2(B) .

We use this latter condition in both algorithms presented in this section.

We initialize Linear-k-Exchange by using a greedy solution. The algorithm alters the given

weight function by rounding the weights down to integer multiples of a sufficiently small value

ε2. This is equivalent to requiring that each improvement made by the algorithm increases

the potential function w2 by an additive amount of at least ε 22 . Although this differs from our

standard, multiplicative notion of approximate local optimality, it will simplify our analysis.

Finally, we note that the fact that (X, I) is a strong k-exchange system will be used only

in the analysis of Linear-k-Exchange. This is also the case for the algorithm that we present in

the next section for the monotone submodular case. In particular, neither algorithm needs to

construct the mapping Y from Definition 5.7.

Algorithm 10: Linear-k-Exchange

Input: Approximation parameter εStrong k-exchange system (X, I), given as an independence ora-cleWeight function w : X → R≥0

Let Sinit be the result of running Greedy on (X, I), w;

Let ε2 = k+12 w(Sinit)ε/n;

S ← Sinit;

foreach x ∈ X do w(x)←⌊w(x)ε2

⌋ε2;

repeatforeach A ⊆ X \ S with |A| ≤ k do

foreach R ⊆ S with |R| ≤ k2 − k + 1 doif (S \R) ∪A ∈ I and w2((S \R) ∪A) > w2(S) then

S ← (S \R) ∪A;break;



We now consider the approximation ratio of Linear-k-Exchange. We consider an arbitrary

instance (X, I), w and suppose that S is a locally optimal solution returned by the algorithm

on this instance, while O is a global optimum for this instance.

A general outline of our proof is as follows: we consider only a particular set of (k, r(k))-

exchanges (A,R) where A ⊆ O and R ⊆ S. These exchanges satisfy some additional properties,

which we use in Lemma 6.1 to derive a relationship between the non-oblivious potential function

w2 and the weight function w. In Lemma 6.2 we then use this relationship to derive a lower

bound on the weight w(x) of each element x in a locally optimal solution. In Theorem 6.3

we combine these lower bounds to obtain a bound on the locality ratio of Linear-k-Exchange

(in terms of the rounded weight function used by the algorithm). Finally, in Theorem 6.4 we

bound the approximation ratio and runtime of Linear-k-Exchange.

We now describe the set of (k, r(k))-exchanges used in our analysis. We have S,O ∈ I for

the strong k-exchange system (X, I). Thus, there must be a collection Y assigning each z of

O \ S a set Y (z) ⊆ S \ O, satisfying the conditions of Definition 5.7. For each x ∈ S \ O,

let Px be the set of all elements z ∈ O \ S for which: (1) x ∈ Y (z) and (2) for all y ∈ Y (z),

w(y) ≤ w(x). That is, Px is the set of all elements z ∈ O \ S that share x as a largest weight

member of Y (z). Finally, for all x ∈ S∩O, we set Px = x, and extend Y so that Y (Px) = x(note that the resulting Y still obeys properties (WK1), (WK2), and (SK3), since for x ∈ S∩Owe have y ∈ Y (x) only when y = x and (S \ Y (Px)) ∪ Px = S.

For each x ∈ S \O, consider the exchange (Px, Y (Px). From property (WK2) of Y we have

|Px| ≤ k. Similarly, from property (WK1) and the fact that all elements z ∈ Px share the

common element x ∈ Y (z) we have

|Y (Px)| ≤ k(k − 1) + 1 = k2 − k + 1 = r(k) .

Finally, from property (SK3) we have (S \ Y (Px)) ∪ Px ∈ I. Thus, (Px, Y (Px)) is a valid

(k, r(k))-exchange for all of our sets Px, where x ∈ S. For all x ∈ S ∩O, we have (S \Px)∪Px.

Furthermore, note that Pxx∈S is a partition of O. The construction of P depends crucially on

the strong k-exchange property, which allows us to consider Y (z) for each element z in isolation

then combine appropriate elements z into a single valid exchange (Px, Y (P (x)).

The following extension of a theorem from [11] relates the non-oblivious potential function

w2 to the weight functions w.

Lemma 6.1. Suppose that x ∈ S and z ∈ X \S and let w be any non-negative weight function

on S + z. If w(y) ≤ w(x) for all y ∈ Y (z), then

w(z)2 − w2(Y (z)− x) ≥ w(x) · (2w(z)− w(Y (z))) .

Proof. First, we note that

0 ≤ (w(x)− w(z))2 = w(x)2 − 2w(x) · w(z) + w(z)2 . (6.1)


Additionally, since every element y in Y (z) has weight at most w(x):

w2(Y (z)− x) =∑

y∈Y (z)−x

w(y)2 ≤ w(x)∑

y∈Y (z)−x

w(y) = w(x) · w(Y (z)− x) . (6.2)

Adding (6.1) and (6.2) we obtain

w2(Y (z)− x) ≤ w(x)2 − 2w(x) · w(z) + w(z)2 + w(x) · w(Y (z)− x) .

Since w(x) · w(Y (z)− x) + w(x)2 = w(x) · w(Y (z)) this is equivalent to

w2(Y (z)− x) ≤ w(x) · (w(Y (z))− 2w(z)) + w(z)2 ,

which is equivalent to the inequality stated in the Lemma.

The next Lemma uses Lemma 6.1 together with the properties of the partition Px to trans-

late the local optimality of S with respect w2 into a lower bound on w(x), by considering the

(k, r(k))-exchange (Px, Y (Px)).

Lemma 6.2. Suppose that x ∈ S and Px ⊆ X \ S such that x ∈ Y (z) for all z ∈ Px. Let w

be any non-negative weight function such that w2(Px) ≤ w2(Y (Px)) and w(y) ≤ w(x) for all

y ∈ Y (Px). Then,

w(x) ≥∑z∈Px

[2w(z)− w(Y (z))] .

Proof. First, we consider the case w(x) = 0. The weights w are non-negative and w(x) ≥ w(y)

for every y ∈ Y (Px). Thus, w(y) = 0 for all y ∈ Px and so w2(Y (Px)) = 0. Moreover, since

w2(Px) ≤ w2(Y (Px)) = 0, we must have w(z) = 0 for all z ∈ Px. The claim then follows.

Now, suppose that w(x) > 0. Since x ∈ Y (z) for all z ∈ Px, we have

w2(Px) ≤ w2(Y (Px)) ≤ w(x)2 +∑z∈Px

w2(Y (z)− x) . (6.3)

Rearranging (6.3) using w2(Px) =∑

z∈Px w(z)2 we obtain:∑z∈Px

[w(z)2 − w2(Y (z)− x)

]≤ w(x)2 . (6.4)

For each z ∈ Px, we have w(y) ≤ w(x) for every y ∈ Y (z). Thus, we can apply Lemma 6.1 to

each term on the left of (6.4), giving:∑z∈Px

w(x) · (2w(z)− w(Y (z))) ≤ w(x)2 .

Dividing by w(x) (recall that w(x) 6= 0) then gives the inequality stated in the Lemma.


We can now derive a bound on the locality ratio of Linear-k-Exchange. We defer a discussion

of the tightness of this bound until the next section.

Theorem 6.3. Let w be the weight function used by Linear-k-Exchange after rounding all the

weights down to the nearest integer multiple of ε2, and suppose that S is a locally optimal

solution with respect to the resulting potential function w2, and O is a global optimum with

respect to w. Then,

[w(S) ≥ 2

k + 1w(O)]

Proof. As we have noted, each of the exchanges (Px, Y (Px)) is a valid (k, r(k))-exchange. It

thus follows from the local optimality of S that

w2(Px) ≤ w2(Y (Px)) , (6.5)

for all x ∈ S \O. Moreover, (6.5) holds trivially in the case that x ∈ S ∩O since then we have

(S \ Y (Px)) ∪ Px = S. Thus (6.5) is valid for all x ∈ S. For each x ∈ S, Lemma 6.2 then gives

the inequality

w(x) ≥∑z∈Px

[2w(z)− w(Y (z))] .

Adding all |S| of these inequalities gives∑x∈S

w(x) ≥∑x∈S

∑z∈Px

[2w(z)− w(Y (z))] .

Since P is a partition of O, this is equivalent to∑x∈S

w(x) ≥∑z∈O

[2w(z)− w(Y (z))] . (6.6)

For any x ∈ S, w(x) ≥ 0 and there are at most k distinct z for which x ∈ Y (z), by property

(WK2) of Y . Thus ∑z∈O

w(Y (z)) ≤ k∑x∈S

w(x) = kw(S) .

Thus, (6.6) implies

w(S) ≥ 2w(O)− kw(S) ,

and the theorem follows.

Theorem 6.3 gives a bound on the locality ratio of Linear-k-Exchange in terms of the rounded

weight function. Now, we consider approximation ratio of Linear-k-Exchange with respect to

the instance’s given weight function, as well as the algorithm’s runtime. First, we bound the

number of improvements that Linear-k-Exchange can make.

Theorem 6.4. For any strong k-exchange (X, I), weight function w, and ε > 0, Linear-k-

Exchange is a(

2k+1 − ε

)-approximation running in time O(ε−2k3nk

2+3).


Proof. First, we show that Linear-k-Exchange is(

2k+1 − ε

)-approximation algorithm. Let w be

the weight function for the given instance and let bwc be rounded weight function:

bwc(x) =

⌊w(x)

ε2

⌋ε2 ,

used by Linear-k-Exchange. Let O = arg maxA∈I w(A) and O′ = arg maxA∈Ibwc(A). Then we

have

w(S) ≥ bwc(S) ≥ 2

k + 1bwc(O′) ≥ 2

k + 1bwc(O)

≥ 2

k + 1

∑x∈O

(w(x)− ε2) =2

k + 1(w(O)− |O|ε2)

≥ 2

k + 1(w(O)− nε2) =

2

k + 1w(O)− w(Sinit)ε ≥

(2

k + 1− ε)w(O) ,

where the second inequality follows from Theorem 6.3.

Now we consider the complexity of Linear-k-Exchange. In each iteration the algorithm

examines O(nk+r(k) = O(nk2+1) potential (k, r(k))-exchanges, and each of these exchanges can

be evaluated in time O(k2). We now bound the number of improvements the algorithm can

make. Let O = arg maxB∈I w2(B) and consider the solution Sinit produced by the initial

greedy phase of Linear-k-Exchange. From Theorem 5.4, (X, I) is k-extendible and so Greedy is

a k-approximation for maximizing w in (X, I). Moreover, if w(x) ≥ w(y) then w2(x) ≥ w2(y)

and so Sinit is also a greedy solution for the instance (X, I), w2. Thus:

kw2(Sinit) ≥ w2(O) .

Every time Linear-k-Exchange applies an exchange (A,R) to the current solution S, we must

have bwc2((S \ R) ∪ A) > bwc2(S). Because each weight bwc(x) is a multiple of ε2, each

improvement that Linear-k-Exchange applies must increase bwc2(S) by at least ε 22 . Thus, the

number of improvements made by Linear-k-Exchange is at most

bwc2(S)

ε 22

≤ w2(S)

ε 22

≤ w2(O)

ε 22

≤ kw2(Sinit)

ε 22

≤ kw(Sinit)2

ε 22

=kn2

ε2,

and so Linear-k-Exchange runs in time O(ε−2k3nk2+3).

Theorem 5.9 shows that the class of strong k-exchange systems is closed under contraction.

As the following theorem shows, we can apply the partial enumeration technique from Section

2.6 to remove the extra ε term from the approximation ratio, at a slight cost in the total

runtime of the algorithm. For technical reasons, we must assume that k ≥ 2. In practice, this

is no restriction, since in the case k = 1, the set system (X, I) is a matroid, and so the greedy

algorithm gives an optimal solution.


Theorem 6.5. Suppose that we set ε = 13n in Linear-k-Exchange. Then, for all k ≥ 2, Part-

Enum(Linear-k-Exchange) is a 2k+1 -approximation algorithm running in time O(k3nk

2+6).

Proof. Theorem 6.4 shows that Linear-k-Exchange is a(

2k+1 −

13n

)-approximation when ε is set

to 13n . Furthermore, every independent set in I has size at most n. Theorem 2.22 then shows

that the approximation ratio of PartEnum(Linear-k-Exchange) is

1

n+n− 1

n

(2

k + 1− 1

3n

)=

1

n+

2

k + 1− 2

n(k + 1)− 1

3n+

1

3n2

≥ 1

n+

2

k + 1− 2

3n− 1

3n+

1

3n2

≥ 2

k + 1,

where we have used k ≥ 2 in the first inequality.

The algorithm PartEnum(Linear-k-Exchange) makes n calls to Linear-k-Exchange. From The-

orem 4.12, each of these takes time O(ε−2k3nk2+3) = O(k3nk

2+5), so the total runtime is

O(k3nk2+6).

Finally, we note that in many specific settings the runtime of Linear-k-Exchange can be

drastically improved. We have assumed here that I is given as an independence oracle, and so

for any set A ⊆ X \ s the problem of finding a set R ⊆ S such that (A,R) is a valid (k, r(k))-

exchange requires enumerating over all possible sets R. In some cases, however, we may be able

to compute R directly, given A. For example, the setting of (k + 1)-claw free graphs, we can

easily determine the correct set R from the set A by examining the neighborhood of A. In this

case, the O(nr(k)) time used to find R can be replaced by the time required to determine the

neighborhood of a set of size at most k.

6.2 Monotone Submodular Maximization

We now consider the problem of maximizing a monotone submodular function in a strong

k-exchange system. Before presenting the general algorithm, we describe some of the difficul-

ties that arise when attempting to generalize the approach used in Linear-k-Exchange to the

submodular case.

An obvious difficulty is that in the monotone submodular case, we can no longer represent

f as a sum of weights. The non-oblivious potential function w2 made critical use of such a

representation. However, borrowing some intuition from the greedy algorithm, we might decide

to replace each weight w(x) with the marginal gain/loss associated with adding/removing x

from S. That is, at the start of each iteration of the local search algorithm, we assign each

element x ∈ X weight w(x) = f(S+x)− f(S−x), where S is the algorithm’s current solution,

then we proceed as before. Note that w(x) is simply the marginal gain fS(x) in the case that

x 6∈ S or the marginal loss fS−x(x) suffered by removing x from S, in the case that x ∈ S.


We define the non-oblivious potential function w2 in terms of the resulting weight function

w as before. The proof of Theorem 6.3 then goes through with only a slight change in the

locality ratio from 2k+1 to 2

k+3 . Unfortunately, the resulting local search algorithm may never

converge to a local optimum, as we show in the following example.

We let f be a simple, unweighted coverage function on the universe U = a, b, c, x, y, z.Specifically, let

S1 = a, b S3 = x, y

S2 = a, c S4 = x, z .

Our ground set X is then 1, 2, 3, 4 and our objective function f(A) =∣∣⋃

i∈A Si∣∣ for all A ⊆ X.

We consider the 2-exchange system with only 2 bases: P = 1, 2 and Q = 3, 4.For current solution S = P we have w(1) = w(2) = 1 and w(3) = w(4) = 2. Since

w2(1, 2) = 2 < 8 = w2(3, 4), the 2-replacement (3, 4, 1, 2) is applied, and the current

solution becomes Q. In the next iteration, we have S = Q, and w(1) = w(2) = 2 and w(3) =

w(4) = 1, so the 2-replacement (1, 2, 3, 4) is applied by the algorithm. This returns us to

the solution to P , where the process repeats indefinitely.

Intuitively, the problem with this approach is that the weight function used in each step

of the algorithm depends on the current solution S (since all marginals are taken with respect

to S). Hence, it may be the case that an exchange (A,R) results in an improvement with

respect to the current solution’s potential function, but in fact results in a decreased potential

value in the next iteration, after the weights have been updated. Surprisingly, we can solve the

problem by introducing even more variation in the potential function. Specifically, we allow

the algorithm to use a different weight function not only for each current solution S, but also

for each (k, r(k))-exchange (A,R) that is considered.

Before we give the full algorithm, let us describe the general approach used to generate the

weights used in our potential function. Rather than calculating all marginal gains with respect

to the set S, we consider elements in some order and assign each element a weight corresponding

to its marginal gain with respect to the set of elements that precede it. By carefully updating

both the current solution and this ordering each time we apply a local improvement, we can

ensure that the algorithm converges to a local optimum and still obtain the stated bounds on

the locality ratio.

The algorithm stores the current solution S as an ordered sequence s1, s2, . . . , s|S|. At each

iteration of the local search, before searching for an improving (k, r(k))-exchange, the algorithm

assigns a weight w(si) to each si ∈ S, as follows. Let Si = sj ∈ S : j ≤ i be the set containing

the first i elements of S. Then, the weight function w assigning weights to the elements of S is

given by

w(si) = f(Si−1 + si)− f(Si−1) = f(Si)− f(Si−1)


for all si ∈ S. Note that the weights w satisfy

∑si∈S

w(si) =

|S|∑i=1

f(Si)− f(Si−1) = f(S)− f(∅) = f(S) . (6.7)

In order to evaluate each (k, r(k))-exchange (A,R), we also need to assign weights to the

elements1 in A ⊆ X \ S. We use a different weight function for each k-replacement (A,R),

obtained as follows. We order A according to some arbitrary ordering ≺ on X and let ai be

the ith element of A and Ai = aj ∈ A : j ≤ i. Then, the weight function w(A,R) assigning

weights to the elements of A is given by

w(A,R)(ai) = f((S \R) ∪Ai−1 + ai)− f((S \R) ∪Ai−1) = f((S \R) ∪Ai)− f((S \R) ∪Ai−1)

for all ai ∈ A. Note that for every (k, r(k))-exchange (A,R),

∑x∈A

w(A,R)(ai) =

|A|∑i=1

(f((S \R) ∪Ai)− f((S \R) ∪Ai−1))

= f((S \R) ∪A)− f(S \R) ≥ f(S ∪A)− f(S) , (6.8)

where the inequality follows from the submodularity of f .

Since the function f is monotone submodular, all of the weights w and w(A,R) that we

consider will be nonnegative. Furthermore, the weights w assigned to elements in S remain

fixed for all k-replacements considered in a single phase of the algorithm. These facts play

a crucial role in our analysis. Finally, note that although both w and w(A,R) depend on the

current solution S, we omit this dependence from our notation, to avoid clutter. When there

is the possibility of confusion, we shall state explicitly which solution’s weight function we are

considering.

Our final algorithm, Submodular-k-Exchange is shown in Algorithm 11. It starts from an

initial solution Sinit = arg maxe∈X f(e), consisting of the singleton element of largest value.

When applying a valid (k, r(k))-replacement (A,R), the algorithm updates the ordered solution

S in a fashion that ensures all of the elements of S \R precede those of A, while all elements of

S \R and, respectively, A, occur in the same relative order. As we shall see in the next section,

this guarantees that the algorithm will converge to a local optimum. As in the linear case,

we use the sum of squared weights w2(R) =∑

b∈R w(b)2 and w2(A,R)(A) =

∑a∈Aw(A,R)(a)2 to

guide the search. Also, to ensure polynomial-time convergence, we adopt an approach described

by Arkin and Hassin [5] in the context of weighted k-set packing. We round all of our weights

down to the nearest integer multiple of ε3, thus ensuring that at each step we make an additive

1In order to simplify our analysis, we also consider here, implicitly, the case in which A ⊆ R, and so A ⊆X \(S\R). This will allow us to easily deal with the intersection S∩O in our analysis, even though the algorithmnever actually makes use of the resulting weights.


Algorithm 11: Submodular-k-Exchange

Input: Approximation parameter εStrong k-exchange system (X, I), given as an independence ora-cleMonotone submodular function f : 2X → R≥0, given as a valueoracle

Fix an arbitrary ordering ≺ on the elements of X;Let Sinit = arg maxe∈X f(e);Let ε3 = k+3

2n f(Sinit)ε;S ← Sinit;repeat

X ← ∅;for i = 1 to |S| do

w(si)←⌊f(X+si)−f(X)

ε3

⌋ε3;

X ← X + si;

foreach A ∈ X \ S with |A| ≤ k doforeach R ⊆ S with |R| ≤ k2 − k + 1 do

if (S \R) ∪A ∈ I thenLet ai be the ith element in A according to ≺;X ← S \R;for i = 1 to |A| do

w(A,R)(ai)←⌊f(X+ai)−f(X)

ε3

⌋ε3;

X ← X + ai;

if w2(A,R)(A) > w2(R) then

Delete all elements in R from S;Append the elements of A to the end of S, in the order given by ≺;break;


improvement of at least ε3. Because of this rounding factor, we must actually work with the

following analogs of (6.7) and (6.8):

∑x∈S

w(x) ≤|S|∑i=1

(f(Si)− f(Si−1)) = f(S)− f(∅) = f(S) (6.9)

∑x∈A

w(A,R)(x) ≥|A|∑i=1

(f((S \R) ∪Ai)− f((S \R) ∪Ai−1)− ε3)

= f((S \R) ∪A)− f(S \R)− |A|ε3 ≥ f(S ∪A)− f(S)− |A|ε3 (6.10)


We now analyze the approximation and runtime performance of Submodular-k-Exchange.

The analysis follows closely the analysis of Linear-k-Exchange from Section 6.1. As there, we fix

some instance (X, I), f , and we let S be a locally optimal solution for this instance produced

by Submodular-k-Exchange and O is a globally optimal solution. Again, we only consider a

particular set of valid (k, r(k))-exchanges.

Consider the final iteration of the local search algorithm, in which no improvement for S

could be found, and let w be the weight function for S in this iteration. For each x ∈ S, we

define Px as in Section 6.1, using the weight function w. That is, for x ∈ S \O we let Px be the

set of all elements z ∈ O \S for which x ∈ Y (z) and for all y ∈ Y (z), w(x) ≥ w(y). Then, as in

Section 6.1, (Px, Y (Px) is a valid (k, r(k))-exchange for each x ∈ S \ O. Finally, as in Section

6.1 for each x ∈ S ∩O, we define Px = x and Y (x) = x. Note that

w(Px,Y (Px))(Px) = f(S)− f(S − x) = w(x)

for all such x (see footnote 1 on page 103).

Note that only weights for elements of S are considered in the definition of Px. Because

our algorithm assigns each element x ∈ S a single weight for the entire final iteration, Pxx∈Sremains a well-defined partition of O.

We now derive a bound on the locality ratio of Submodular-k-Exchange. The proof makes

use of Lemmas 6.1 and 6.2 from Section 6.1.

Theorem 6.6. Let S be the locally optimal solution produced by Submodular-k-Exchange when

applied to ε and the instance (I, X), f and let O be the optimal solution for this instance. Then,

f(S) ≥(

2

k + 3− ε)f(O)

Proof. For each (k, r(k))-exchange (Px, Y (Px)) the local optimality of S implies

w2Px,Y (PX)(Px) ≤ w2(Y (Px)) .

Furthermore, for x ∈ S ∩O, our definition of Px and Y (x) gives:

w2Px,Y (PX)(Px) = w2

Px,Y (PX)(x) = w2(x) = w2(Y (Px)) .

Thus, for each x ∈ S, the weight function on S ∪ Px that assigns each element x ∈ S weight

w(x) and each element z ∈ Px weight w(Px,Y (Px))(z) satisfies the conditions of Lemma 6.2, and

so we have:

w(x) ≥∑z∈Px

[2w(Px,Y (Px))(z)− w(Y (z))

],


for each x ∈ S. Adding all |S| of these inequalities, we obtain:∑x∈S

w(x) ≥∑x∈S

∑z∈Px

[2w(Px,Y (Px))(z)− w(Y (z))

]. (6.11)

From (6.9), we have ∑x∈S

w(x) ≤ f(S)

Additionally, from (6.10),∑z∈Px

w(Px,Y (Px))(z) ≥ f(S ∪ Px)− f(S)− |Px|ε3 .

Thus, (6.11) implies

f(S) ≥ 2∑x∈S

[f(S ∪ Px)− f(S)− |Px|ε3]−∑x∈S

∑z∈Px

w(Y (z)) . (6.12)

Now, we note that P is a partition of O. Thus,∑x∈S|Px| = |O| and

∑x∈S

∑z∈Px

w(Y (z)) =∑z∈O

w(Y (z)) .

Furthermore, Theorem 2.6 gives∑x∈S

[f(S ∪ Px)− f(S)] ≥ f(S ∪O)− f(S) .

Thus, (6.12) implies,

f(S) ≥ 2 (f(S ∪O)− f(S)− |O|ε3)−∑z∈O

w(Y (z)) . (6.13)

We have w(x) ≥ 0 for all x ∈ S, and from property (WK2) of Y , each x ∈ S appears in at most

k sets Y (z). Thus, ∑z∈O

w(Y (z)) ≤ k∑x∈S

w(x) ≤ kf(S) ,

where the last inequality follows from (6.9). This, together with

f(S) ≥ 2 (f(S ∪O)− f(S)− |O|ε3)− kf(S) .

rearranging this, we obtain

(k + 3)f(S) ≥ 2f(S ∪O)− 2|O|ε3 . (6.14)

The fact that f(S ∪O) ≥ f(O), which follows from the monotonicity of f , together with (6.14)


implies

f(S) ≥ 2

k + 3f(O)− 2

k + 3|O|ε3 .

Finally, we have |O| ≤ n, and so

2

k + 3|O|ε3 ≤

2n

k + 3ε3 = εf(Sinit) ≤ εf(O) .

Thus,

f(S) ≥(

2

k + 3− ε)f(O) .

We now consider the tightness of our bounds on the locality ratios of both Linear-k-Exchange

and Submodular-k-Exchange. Berman [11] gives a tight example of an unweighted (k + 1)-

claw free graph for which his local search algorithm has a locality ratio of 2k+1 . However, his

algorithm considers only a particular subset of valid (k, r(k))-exchanges and so the example

he gives is no longer locally optimal with respect to our algorithms’ neighborhoods. Hurkens

and Schrijver [55] show that the oblivious local search algorithm for unweighted independent

set in (k + 1)-claw-free graphs has a locality ratio of at most 2k+ε , where ε depends on the size

of the improvements considered by the algorithm. Specifically, they give a graph G in which

there are independent sets S and O such that |O| ≥ k+ε2 |S|, but S is locally optimal under

(t, t− 1)-exchanges. We now show how to use this result to obtain an almost-tight instance for

our analysis.

Let N(A,S) be the set of all vertices of S adjacent to some vertex of A, and let t = k.

Then, every valid (k, r(k))-exchange for S must have the form (A,N(A,S)) for some set A of

at most k independent vertices. The local optimality of S implies that |A| ≤ |N(A,S)| for all

such A, and so if we assign all vertices of G weight 1, then for all such A we have

w2(N(A,S)) = w(N(A,S)) = |N(A,S)| ≤ |A| = w(A) = w2(A) .

Thus, S is also a local optimum with respect to w2 in Linear-k-Exchange. We have

w(S) = |S| ≤ 2

k + ε|O| = 2

k + εw(O) ,

and so the locality ratio of Linear-k-Exchange is at most 2k+ε .

Now, suppose that we add a set C of |S| isolated vertices to G and define an objective

function f for the resulting instance by

f(D) =∑v∈D

w(D, v) ,


where

w(D, v) =

1, if v ∈ S ∪O

1, if v ∈ C and D ∩ S = ∅

0, if v ∈ C and D ∩ S 6= ∅ .

Then, we note that for all B ⊆ A and v 6∈ A we have

0 ≤ fA(v) ≤ fB(v) ,

and so f is monotone submodular. Now, we consider an arbitrary set A ⊆ O ∪ C of size at

most k. First, suppose that N(A,S) = S. Then, w(A,N(A,S))(v) = 1 for all v ∈ A and so

w2(A,N(A,S))(A) =

∑v∈A

12 = |A| = |N(A,S)| =∑

v∈N(A,S)

12 = w2S(N(A)) .

Now, suppose that N(A,S) ⊂ S. We have at least one element of S in S \ N(A,S), and so

w(A,N(A,S))(v) = 0 for all v ∈ A ∩ C. Thus,

w2(A,N(A,S))(A) =

∑v∈A∩O

12+∑

v∈A∩C02 = |A∩O| ≤ |N(A∩O,S)| ≤ |N(A,S)| =

∑x∈S

12 = w2S(N(A,S)) ,

and so S is a local optimum for Submodular-k-Exchange. But O ∪C is an independent set in G

and

f(O ∪ C) = |O|+ |C| = |O|+ |S| ≥ k + ε

2|S|+ |S| = k + 2 + ε

2f(S) .

Thus, the locality ratio of Submodular-k-Exchange is at most 2k+2+ε , and our bounds on the

locality gap of both Linear-k-Exchange and Submodular-k-Exchange are tight up to an additive

term of 1 in the denominator.

It remains to consider the runtime of Submodular-k-Exchange, which depends on the number

of local improvements (A,R) that it applies. We shall show Submodular-k-Exchange constantly

improves a bounded, global quantity, and so the total number of improvements that it can

make is bounded. Specifically, we show that although the weights w assigned to elements of the

current solution S change after each improvement is made, the non-oblivious potential w2(S)

is monotonically increasing.

Lemma 6.7. Suppose that Submodular-k-Exchange applies a (k, r(k))-exchange (A,R) to some

solution S to obtain a new solution T . Let wS be the weight function w : S → R≥0 determined

by solution S and wT be the weight function w : T → R≥0 determined by solution T . Then,

w2T (T ) ≥ w2

S(S) + ε 23 .

Proof. We first show that (1) wS(si) ≤ wT (si) for each element si ∈ S \R and (2) w(A,R)(ai) ≤wT (ai) for any element ai ∈ A, where w(A,R) is the weight function determined by S and the

(k, r(k))-exchange (A,R).


In the first case, let Si (respectively, Ti) be, the set of all elements in S (respectively T ) that

come before si (recall that S and T are stored as an ordered sequence) and Ai be the set of

all elements of A that come before ai in the arbitrary global ordering ≺. When the algorithm

updates the solution S, it removes all elements of R from S, appends all of A after S \R, and

leaves all elements of S \ R in the same relative order. Thus, Ti ⊆ Si. It follows directly from

the submodularity of f that

wS(si) =

⌊f(Si + si)− f(Si)

ε3

⌋ε3 ≤

⌊f(Ti + si)− f(Ti)

ε3

⌋ε3 = wT (x) , (6.15)

for each si ∈ S \R.

In the second case, let Ai be the set of all elements of A that come before ai (in the ordering

≺) and Ti be the set of all elements of T that come before ai. When the algorithm updates the

solution S, it removes all elements of R from S, places all elements of A after all those of S \R,

and leaves all elements of A in the same relative order. Thus, Ti ⊆ (S \ R) ∪ Ai and so from

the submodularity of f

w(A,R)(ai)=

⌊f((S\R) ∪Ai + ai)− f((S\R) ∪Ai)

ε3

⌋ε3 ≤

⌊f(Ti + ai)− f(Ti)

ε3

⌋ε3 = wT (ai) ,

(6.16)

for each ai ∈ A.

Finally, since Submodular-k-Exchange applied the improvement (A,R) to S, we must have

w2S(R) < w2

(A,R)(A), and since all weights are integer multiples of ε3, we must in fact have:

w2S(R) ≤ w2

(A,R)(A)− ε 23

From this inequality, together with (6.15) and (6.16), we have

w2S(S) = w2

S(S \R) + w2S(R)

≤ w2S(S \R) + w2

(A,R)(A)− ε 23

≤ w2T (S \R) + w2

T (A) + ε 23

= w2T (T ) + ε 2

3 .

We can now state our main result, which shows that Submodular-k-Exchange is a(

2k+3 − ε

)-

approximation and gives a bound on its runtime.

Theorem 6.8. For any strong k-exchange (X, I), monotone submodular function f , and ε > 0,

Submodular-k-Exchange is a(

2k+3 − ε

)-approximation algorithm running in deterministic time

O(ε−2k3nk2+4).

Proof. Theorem 6.6 implies that Submodular-k-Exchange is a(

2k+3 − ε

)-approximation algo-

rithm, provided that it eventually reaches a local optimum. We now show that it must terminate

in polynomial time.


Each iteration requires time O(n) to compute the weights for S, plus time to search for and

evaluate all potential (k, r(k))-exchanges. There are O(nk+r(k)) = O(nk2+1) possible (k, r(k))-

exchanges (A,R), and each one can be evaluated in time O(k + r(k)) = O(k2), including

the time to compute he weights w(A,R). Thus, the total runtime of Submodular-k-Exchange is

O(Ik2nk2+1), where I is the number of improvements it makes. We now bound I.

Because f is submodular, for any element e and any set T ⊆ X, we have f(T + e)− f(T ) ≤f(e) ≤ f(Sinit). In particular, for any solution S ⊆ X with associated weight function w, we

have

w2(S) =∑e∈S

w(e)2 ≤ |S|f(Sinit)2 ≤ nf(Sinit)2 .

Additionally, from Lemma 6.7, each improvement we apply must increase w2(S) by at least ε 23 ,

and hence the number I of improvements that Submodular-k-Exchange can make is at most

w2(S)− f(Sinit)2

ε 23

≤ nf(Sinit)2 − f(Sinit)2

ε 23

= (n− 1)

(f(Sinit)

ε3

)2

= O(n3ε−2) .

Hence, the total runtime of Submodular-k-Exchange is O(ε−2k2nk2+4).

As was the case with Linear-k-Exchange, we can employ partial enumeration to remove the

ε term from the approximation ratio for Submodular-k-Exchange.

Theorem 6.9. Suppose that we set ε = 12n in Submodular-k-Exchange. Then, PartEnum(Submodular-

k-Exchange) is a 2k+3 -approximation algorithm running in time O(k3nk

2+7).

Proof. Theorem 6.8 shows that Submodular-k-Exchange is a(

2k+3 −

12n

)-approximation when ε

is set to 12n . Furthermore, every independent set in I has size at most n. Theorem 2.22 then

shows that the approximation ratio of PartEnum(Submodular-k-Exchange) is

1

n+n− 1

n

(2

k + 3− 1

2n

)=

1

n+

2

k + 3− 2

n(k + 3)− 1

2n+

1

2n2

≥ 1

n+

2

k + 3− 1

2n− 1

2n+

1

2n2

≥ 2

k + 3,

where we have used k ≥ 1 in the first inequality.

The algorithm PartEnum(Submodular-k-Exchange) makes n calls to Submodular-k-Exchange.

From Theorem 4.12, each of these takes time O(ε−2k3nk2+4) = O(k3nk

2+6), so the total runtime

is O(k3nk2+7).

Chapter 7

Limitations of Oblivious Local

Search for CSPs

In the previous sections, we have demonstrated the power of non-oblivious local search, which

involves altering the potential function used to guide the local search algorithm. Now we turn

again to the generic local search algorithm GenLocalSearch presented in Chapter 1, and examine

the effect of changing other component functions. Specifically, in Section 7.1 we consider the

power of oblivious local search when large neighborhoods are used, and in Section 7.2 we consider

a variant of oblivious local search in which the initial solution Sinit is chosen randomly. We also

consider whether using a best improvement pivot rule together with either a random or greedy

initial solution gives any improvement.

Our results are formulated in the general setting of constraint satisfaction problems. A

constraint satisfaction problem (or CSP) Π consists of a domain D and a set Γ of relations on

D≤`, each of arity at most some constant `. An instance I of Π = (D,Γ) is then given by a

set V of n variables and a list of m constraints, each of the form (R, T ) where R ∈ Γ and T is

a tuple of arity(R) variables from V . The goal is to find an assignment σ, giving each variable

in V a value in D, that maximizes the number of constraints (R, T ) ∈ I for which R(σ(T )) is

true. We say that such constraints are satisfied by σ.

Here, we consider the special case in which all variables are Boolean, so D = 0, 1. We

shall consider weighted CSPs, in which each constraint C has an associated weight w(C) in R≥0,

and the problem is to find an assignment σ that satisfies constraints of maximum total weight.

For particular CSPs, it may by possible to prove analogs of our results for the unweighted case

as well, but we consider only the general weighted case here. To simplify our analysis, we allow

an instance to contain multiple copies of a single constraint (R, T ). Such constraints can easily

be replaced by a single, appropriately weighted constraint, so this is without loss of generality.

Two examples of Boolean CPSs that we shall consider in more detail in this section are

MaxCut, and Max-k-Sat. In MaxCut, we are given an edge-weighted graph G = (V,E) and seek

a partition (S, V \S) of the vertices of V that maximizes the total weight of the edges with one

111

Chapter 7. Limitations of Oblivious Local Search for CSPs 112

endpoint in S and one endpoint in V \ S. MaxCut can be represented as a Boolean CSP whose

constraints are all non-equalities of the form u 6= v, each corresponding to an edge (u, v) ∈ E.

Then, the set of variables from V assigned the value 1 form one side of the partition (S, V \ S)

and the set assigned the value 0 form the other side.

In Max-k-Sat, we are given a collection of clauses, over some variable set V , each containing

exactly1 k literals over V and seek an assignment to the variables of V that satisfies a set

of clauses of maximum total weight. Max-k-Sat is easily captured by a CSP in which each

constraint is a disjunction of exactly k literals over V . Note that here we incorporate the

notion of a literal, which is either a variable or its negation, into the definition of the set of

constraints.

We focus on Boolean CSPs because they are well-studied and can be formulated naturally

as the sort of combinatorial optimization problems we have considered in previous chapters.

We let the ground set X be the set of variables V , and the set of feasible solutions contain all

subsets of V . We identify a subset S ⊆ V with the assignment σS that assigns every variable

in S the value 1, and every variable not in S the value 0. Then, we seek to find a set whose

corresponding assignment satisfies constraints of maximum total weight. That is, we consider

the (trivial) independence system (V, 2V ), and seek to maximize the objective function

f(S) = w( (R, T ) ∈ I : R(σS(T )) ) .

With this correspondence in mind, we shall implicitly identify a set of variables S ⊆ V

with the assignment σS for the rest of this chapter. Specifically, we shall refer to S ⊆ V

as an assignment, and use terminology for assignments and sets of variables interchangeably.

Additionally, for a CSP instance I, we denote by f(I, S) the total weight of the constraints in

I satisfied by σS . We define

f∗(I) = maxS⊆V

f(I, S) .

For any two sets A,B, we denote the symmetric difference (A \ B) ∪ (B \ A) by A 4 B. If

S ⊆ V is an assignment and A ⊆ V is some set of variables, then S 4 A is the assignment

obtained from S by flipping (the values assigned to) each variable in A and leaving the values

of other variables in V unchanged. Using this notation, it is possible to formulate a natural,

oblivious local search algorithm for constraint satisfaction problems, that repeatedly attempts

to improve the current assignment S by flipping some bounded number of variables.

For a function h : N→ N, we consider the oblivious h(n)-local search algorithm, h(n)-Local-

Search, shown in Algorithm 12. At each step, h(n)-LocalSearch flips at most h(n) variables,

and terminates when no such change improves the total weight of the constraints satisfied by

S. That is, h(n)-LocalSearch finds an assignment S such that f(I, S) > f(I, S4A) for all A of

size at most h(n). We call such an assignment S an h(n)-local optimum. As all of the results

1Sometimes this variant is called exact Max-k-Sat to distinguish it from the case in which a clause is allowedto have at most k literals per clause.


Algorithm 12: h(n)-LocalSearch

Input: CSP Instance ILet Sinit be some subset of V ;S ← Sinit;repeat

foreach C ⊆ V with |C| ≤ h(n) doif f(I, S 4 C) > f(I, S) then

S ← S 4 C;break

until no change is made to S;return S;

of this chapter are negative, we do not concern ourselves with the runtime of Algorithm 12.

Indeed, all of our results in this chapter are information-theoretic, and hold independent of any

complexity assumptions.

Note that in Algorithm 12 we do not specify how to choose Sinit. In Section 7.1, we shall

examine the worst-case locality ratio of h(n)-LocalSearch, which does not depend directly on the

choice of Sinit. Specifically, we examine the relationship between the locality ratio of h(n)-Local-

Search and the function h(n). In Section 7.2, we examine the worst-case expected approximation

performance of h(n)-LocalSearch when Sinit is chosen as a uniform random subset of V .

7.1 Large Neighborhoods

Our first set of results involves the dependence of locality ratio of h(n)-LocalSearch on the

function h(n). Specifically, we examine whether the locality gap of h(n)-LocalSearch can be

significantly improved by increasing the size of the neighborhoods that it considers. In Theorem

3.2 we considered this question in the context of maximum coverage subject to a matroid

constraint, and gave an instance in which any oblivious local search algorithm with a constant-

sized neighborhood has a locality ratio only O(r−1) larger than that of 1-local search, where r

is the rank of the matroid. Now, we prove a similar result in the general context of Boolean

CSPs.

We consider instances and assignments that satisfy the following property.

Definition 7.1 ((Y, Z)-robust assignment). Consider an instance I of a Boolean CSP Π =

(0, 1,Γ) and an assignment S ⊆ V for this instance. Let Y, Z ⊆ V . We say that S is (Y,Z)-

robust for the instance I if any assignment S′ with f(I, S′) > f(I, S) must differ from S by at

least 1 element of Y and one element of Z (i.e. (Y 4 S′) 6= ∅ and (Z 4 S′) 6= ∅).

Clearly, any assignment S that is (Y, Z)-robust for an instance I is also a 1-local optimum

of I. Thus, the locality ratio of 1-LocalSearch is at most f(I, S)/f∗(I), where S is any (Y,Z)-

robust assignment for I. We shall show that this bound holds asymptotically for all o(n)-local


search.

Let Π be a Boolean CSP and suppose that I is an instance of Π with a (Y,Z)-robust

assignment S. We shall use I and S to construct an infinite family of related instances I(k, d)

where k and d are positive integers. Because our construction applies to any Boolean CSP it is

necessarily abstract. In order to help the reader follow the construction, we provide a concrete,

running example of its application to MaxCut. We depict the MaxCut instances as graphs, and

depict an assignment S (corresponding to the cut (S, V \S)) by using double circles for vertices

that are in S and single circles for vertices in V \ S. We label edges with their weights, leaving

edges with weight 1 unlabeled in order to avoid clutter.

The starting point for our example is shown in Figure 7.1. It depicts a small instance I of

MaxCut with a (Y,Z)-robust assignment S = p, a, where Y = a, b and Z = p, q. Note

that f(I, S) = 2, and the only assignments that satisfy more constraints are a, b and p, q,each of which satisfy all of the constraints of I. Both of these assignments differ from S by one

of a, b and one of p, q, so S is indeed an (a, b, p, q)-robust assignment.

a

q

b

p

Figure 7.1: I, with (a, b, p, q)-robust assignment S

Our results hold for CSPs Π = (0, 1,Γ) that are non-trivial in the following sense: there

must exist some relation P ∈ Γ that is not satisfied by the all-zeroes tuple 0arity(P ) and some

relation N ∈ Γ that is not satisfied by the all-ones tuple 1arity(N).

Suppose that Π is not trivial, and consider the relations P and N described above. We

shall define two small CSP instances, IP which contains a single constraint using the relation

P and IN , which contains a single constraint using the relation N . The purpose of IP is to

penalize moves that flip some variable from 1 to 0 (or, in our set notation for assignments,

remove some variable from S). Similarly, the purpose of IN will be to penalize moves that flip

some variable from 0 to 1 (or, in our set notation, add some variable to S). We shall use several

copies of these single-constraint instances to make a particular assignment S preferable to its

neighboring assignments.

Formally, let VP be a set of arity(P ) distinct variables, and T be a tuple containing each

variable in VP once. Then, consider the instance IP comprising a single constraint (P, T ).

There must be some assignment S ⊆ VP and some variable x ∈ S so that (P, T ) is satisfied by

S but not by S − x. We denote this assignment by SP and this variable by xP .

Similarly, let VN be a set of arity(N) distinct variables, and T be a tuple containing each


variable in VN once. Then, consider the instance IN comprising the single constraint (N,T ).

There must be some assignment S ⊂ VN and some variable x ∈ VN \S so that (N,T ) is satisfied

by S but not by S + x. We denote this assignment by SN and the variable x by xN .

Examples of the two instances IP and IN for MaxCut with corresponding assignments SP

and SN and distinguished variables xP and xN are depicted in Figures 7.2 and 7.3, respectively.

Note that SP satisfies the single constraint of IP but SP − xP does not. Similarly, SN satisfies

the single constraint of IN , but SN + xP does not.

xP

s

Figure 7.2: IP , with assignment SP

xN

t

Figure 7.3: IN , with assignment SN

We now return to the general construction. Let δ = f∗(I) − f(I, S). For each pair of

positive integers (k, d) we construct an instance I(k, d) of Π by combining several copies of

the constraints from I, IP and IN as follows. We shall refer to Figure 7.4, which provides an

example of the construction for k = 3 and d = 4 using the instances from Figures 7.1, 7.2, and

7.3. Note that in Figure 7.1 we have δ = 2.

• For each i ∈ [d], let Ii be a copy of I, in which each variable z ∈ Z has been replaced

by a fresh variable zi, unique to Ii and the other variables are the same as in I. We add

to I(k, d) all the constraints from each copy Ii. All of these constraints have the same

weight as their corresponding constraints in I. These constraints appear in the top half

of Figure 7.4.

• For each j ∈ [k] and each variable y ∈ Y ∩S, let Iy,j be a copy of IP , in which the variable

xP has been replaced by y and each variable v ∈ VP − xP has been replaced by a fresh

variable vy,j , unique to Iy,j . We add to I(k, d) the single constraint from each copy Iy,j .Each of these constraints has weight δ. These are the 3 constraints in the lower right part

of Figure 7.4.

• For each j ∈ [k] and each variable y ∈ Y \S, let Iy,j be a copy of IN , in which the variable

xN has been replaced by y and each variable v ∈ VN − xN has been replaced by a fresh

variable vy,j , unique to Iy,j . We add to I(k, d) the single constraint from each copy Iy,j .Each of these constraints has weight δ. These are the 3 constraints in the lower left part

of Figure 7.4. p

We combine the assignments S, SP , and SN into a single assignment S(k, d) for I(k, d) in the

following, natural fashion. Each copy of a variable x is assigned the same value by S(k, d) as x


p1 q1p2 q2p3 q3p4 q4

2

sa,1

2

tb,1

2

sa,2

2

tb,2

2

sa,3

2

tb,3

a b

Figure 7.4: Instance I(3, 4) with assignment S(3, 4)

is assigned by S, SP , or SN . Note that only the variables from Z appear in 2 different copies,

and the construction ensures that these variables are assigned the same value by all of the

relevant assignments S, SP , and SN . The assignment S(3, 4) for our example MaxCut instance

I(3, 4) is shown in Figure 7.4.

Finally, we consider the number of variables n(k, d) in the instance I(k, d). Let V be the

set of variables from I and note that both |VN | and |VP | are at most ` since arity(P ) ≤ ` and

arity(N) ≤ `. We have

n(k, d) ≤ d|Z|+ |V \ Z|+ k(`− 1)|Y | = |V |+ (d− 1)|Z|+ k(`− 1)|Y | = Θ(d+ k) (7.1)

The following lemma shows that S(k, d) is a k-local optimum of I(k, d).

Lemma 7.2. Suppose that S is a (Y, Z)-robust assignment for the instance I. Then, for every

k, d ≥ 1, the assignment S(k, d) is a k-local optimum of I(k, d).

Proof. We consider f(I(k, d), S(k, d)) and show that it is impossible to increase f by flipping at

most k variables. The assignment S(k, d) already satisfies all of the constraints from the copies

Iy,i of IP and IN so in order to increase f , we must increase the total weight of the satisfied

constraints in some copy Ii of I. Because I is (Y,Z)-robust, this requires flipping at least one

variable y ∈ Y and at least one variable zi from Ii. We can satisfy constraints of total weight at

most f∗(I) from each copy Ii of I. Thus, we can satisfy constraints of total additional weight

at most

f∗(I)− f(I, S) = δ ,

for each such variable zi that is flipped.

However, flipping y unsatisifies the single constraint in the copy Iy,j of IP or IN for each

j ∈ [k]: recall that each of these instances was constructed by either letting xP = y in IP if

y ∈ S or xN = y in IN if y 6∈ S; in either case, flipping the variable y unsatisfies the single

constraint from IP or IN . Each of these k constraints has weight δ, so flipping y decreases f


by kδ. The only way to possibly re-satisfy the constraint from Iy,j is by flipping at least one

other variable vy,j unique to Iy,j . Thus, we can re-satisfy clauses of total weight at most δ for

each variable vy,j that is flipped.

Suppose that we flip at least one variable y ∈ Y , together with a variables zi and b variables

vy,j . Then, the total change in the weight of satisfied clauses is at most

−kδ + aδ + bδ ≤ −δ ,

since a + b ≤ k − 1. Thus, we must actually decrease f . The above argument can easily be

adapted to show that S(k, d) is in fact a (k+ 1)-local optimum for I(k, d). However, this small

distinction will be irrelevant in our remaining proofs.

Using Lemma 7.2, we can derive the following bounds on locality ratio of h(n)-LocalSearch

search on the CSP Π. Recall that in Section 1.2, we defined the locality ratio for a problem

to be the infimum of the locality ratios of all of its instances. Thus, we ignore any factors of

magnitude o(n) in our discussion of locality ratios.

Theorem 7.3. For any h = o(n), the locality ratio of h(n)-LocalSearch for a (non-trivial) CSP

Π is at most f(I, S)/f∗(I), where I is an instance of Π and S is a (Y,Z)-robust assignment

for I.

Proof. Let I be some instance of Π and S be a (Y, Z)-robust assignment for I. It suffices to

show that for every ε > 0, there is an instance I(k, d) such that S(k, d) is an h(n)-local optimum

for I(k, d) andf(I(k, d), S(k, d))

f∗(I(k, d))≤ f(I, S)

f∗(I)+ ε .

Let δ = f∗(I)− f(I, S), and let O be an optimal assignment for I. Then, for any k, d ≥ 1,

we define the assignment O(k, d) for the variables V of I(k, d) as follows: O(k, d) assigns each

variable from a copy of I the same value as O, and each variable xy,j , unique to the instance

Iy,j , an arbitrary value. Then, O(k, d) satisfies constraints of weight f(I, O) = f∗(I) from each

copy Ii of I, and hence we have

f(I(k, d), O(k, d)) ≥ df∗(I) . (7.2)

On the other hand, S(k, d) satisfies constraints of weight f(I, S) from each copy Ii of I, as

well as the single constraints from each of the k|Y | instances Iy,j , all of weight δ. Thus,

f(I(k, d), S(k, d)) = df(I, S) + k|Y |δ . (7.3)

We set

k = k(d) =

⌊εf∗(I)

|Y |δ

⌋d ,


and note that k = Θ(d). Then, we consider the resulting family of instances I(k, d) for d ≥ 1.

From (7.1), the number of variables n = n(d) in I(k, d) is Θ(d+k) = Θ(d). We have h(n) = o(n),

and since n = Θ(d) and k = Θ(d), we must have h(n) ≤ k for all sufficiently large d. Thus, from

Lemma 7.2 S(k, d) is an h(n)-local optimum of I(k, d) for all sufficiently large d. Furthermore,

from (7.2) and (7.3), we have

f(I(k, d), S(k, d))

f∗(I(k, d))≤ f(I(k, d), S(k, d))

f(I(k, d), O(k, d))≤ df(I, S) + k|Y |δ

df∗(I)≤ f(I, S)

f∗(I)+ ε

Theorem 7.3 shows that even neighborhoods of super-polynomial size cannot improve the

locality ratio of h(n)-LocalSearch beyond the ratio attained by some (Y,Z)-robust assignment.

We can easily adapt the proof to show that the locality ratio of h(n)-LocalSearch remains less

than 1 even when we allow mildly exponential neighborhoods. In this case, we allow h(n)-

LocalSearch to flip up to some constant fraction of the current assignment’s variables in each

iteration.

Theorem 7.4. Suppose that I is an instance of a CSP Π and that there exists a (Y, Z)-robust

assignment S for I that is not a global optimum of I. Then there exists an absolute constant

c such that the locality ratio of h(n)-LocalSearch is less than 1 whenever h(n) ≤ cn for all

sufficiently large n.

Proof. We use the same argument as in the proof of Theorem 7.3, setting ε to be any positive

constant satisfying

ε < 1− f(I, S)

f∗(I).

Note that since S is not a global optimum, such an ε must exist.

We set k = k(d) as in the proof of Theorem 7.3 and consider the instance I(k, d) with

assignment S(k, d). Again, we have n = n(d) = Θ(d) and k = Θ(d), so there must exist some

absolute constant c such that cn ≤ k for all sufficiently large n. Suppose also that h(n) ≤ cn

for all sufficiently large n. Then, from Lemma 7.2, S(k, d) is a h(n)-local optimum for I(n, k)

for all sufficiently large n. Furthermore, as in the proof of Theorem 7.3, we have

f(I(k, d), S(k, d))

f∗(I(k, d))≤ f(I, S)

f∗(I)+ ε < 1

Our running example based on Figure 7.1 shows that the locality ratio of h(n)-LocalSearch

on MaxCut remains at most 1/2 for all h = o(n). We now give some additional examples of

CSP instances with (Y,Z)-robust assignments.

We begin with Max-2-Sat. Khanna et al. [64] show that the locality ratio of h(n)-LocalSearch

on this problem is at most 2/3 for any h = o(n) by giving an explicit construction. We can

derive the same result by exhibiting the Max-2-Sat instance

x ∨ a x ∨ b a ∨ b ,


and observing that the assignment S = a, b is (a, b, x)-robust for this instance: it satisfies

2 clauses, but we must flip x and either a or b to satisfy all 3 clauses. Thus the locality ratio

of h(n)-LocalSearch on Max-2-Sat is at most 2/3 for all h = o(n).

Moreover, this can be extended to Max-k-Sat for k > 2 as follows. Let Vk−2 be a set of k−2

fresh variables and let Ck−2 be the set of all 2k−2 clauses on Vk−2. We consider the instance

x ∨ a ∨ C : C ∈ Ck−2 ∪ x ∨ b : C ∈ Ck−2 ∪ a ∨ b : C ∈ Ck−2 .

The assignment S = a, b is (a, b, x)-robust for this instance, as well: it satisfies all but

one of the 3 · 2k−2 clauses, but we must flip x and either a or b to satisfy all of the clauses. By

Theorem 7.3, the locality ratio of h(n)-LocalSearch on Max-k-Sat is at most

3 · 2k−2 − 1

3 · 2k−2

for all h = o(n). Note that this is worse than the expected approximation ratio of a uniform

random assignment, which is2k − 1

2k=

4 · 2k−2 − 1

4 · 2k−1.

In contrast, Khanna et al. [64] give a non-oblivious local search algorithm with a locality ratio

of 2k−12k

for all k.

Finally, we consider the Max-k-CCSP problem, in which we all constraints are conjunctions

of exactly k literals. Alimonti [3, 4] shows that oblivious h(n)-local search, cannot approximate

this problem to within any constant factor for any function h(n) = O(1). For k ≥ 2, we can

derive the same result for all h = o(n) by exhibiting the Max-k-CCSP instance consisting of the

single conjunctionk∧i=1

xi .

The assignment S = ∅ is (x1, x2)-robust for this instance: it does not satisfy the single

conjunction, and to satisfy the conjunction we must flip both x1 and x2. From Theorem 7.3,

the locality ratio of h(n)-LocalSearch on Max-k-CCSP is then 0 for any h = o(n). In contrast,

Alimonti [4] shows that non-oblivious 1-local search has a locality ratio of 2/5 for Max-2-CCSP.

Furthermore, Khanna et al. [64] give a non-oblivious local search algorithm with a locality ratio

of 12k

even for the more general Max-k-CSP problem, in which an instance can have arbitrary

constraints of arity k.

7.2 Random and Greedy Initial Solutions and Best Improve-

ment Pivot Rules

All of our results thus far have been stated in terms of various algorithms’ locality ratios. As

we discussed in Chapter 1, the locality gap provides a lower bound on the approximation ratio


attained by a local search algorithm. This bound is tight under the assumption that the initial

solution Sinit is chosen by an adversary. By basing our analysis on the locality gap, we avoid

the difficulties of analyzing the dynamic behavior of the algorithm, while still delivering an

absolute guarantee on its performance.

In real applications, however, the solution Sinit is not chosen by an adversary, and so it

is unclear if this guarantee is the best possible. Some clever or even random choice of Sinit,

combined with some clever way of choosing amongst possible improvements at each stage might

guarantee that a local search algorithm avoids particularly poor local optima. For example,

Arkin and Hassin [5] show that the locality ratio of any local search algorithm for weighted

k-set packing that brings in t sets and discards t− 1 sets from the current solution is at most1

k−1+ 1t

, but Chandra and Halldorsson [22] give an oblivious local search algorithm attains an

approximation ratio of 32(k+1) . Chandra and Halldorsson’s algorithm beats the locality gap

by choosing an initial solution using the greedy algorithm and at each step choosing the local

improvement that increases the objective function the most.

A popular heuristic approach to improving the performance of local search algorithms in-

volves random restarts. Here, the algorithm chooses a random initial solution Sinit, and we

repeat the local search algorithm several times using different random starting solutions.

In this section we examine related variants of the algorithm h(n)-LocalSearch. We consider a

randomized variant of Algorithm 12 in which Sinit is chosen uniformly at random from among

all 2n possible solutions. We call this algorithm randomly initialized h(n)-LocalSearch. If a

problem’s instances possess a very small number of isolated, poor local optima, the expected

performance of this randomized local search algorithm would grow much larger than the locality

gap as the size n = |V | of an instance grows larger. We shall show that this is not the case

for the problem MaxCut; on the contrary, there exists an infinite family of MaxCut instances

in which the expected performance of the randomized local search algorithm remains relatively

poor.

Algorithm 12 searches through all sets C of size at most h(n) for a valid improvement, and

applies the first one that it encounters. Inspired by Chandra and Halldorsson’s algorithm, we

also consider a variant of h(n)-LocalSearch that returns the best improvement found at each step.

This algorithm is easily obtained from Algorithm 12 by removing the break statement from

the loop. The resulting algorithm will then search through all possible improvements, keeping

only the best one found. We additionally modify the loop to search through improvements C in

order of increasing size, so that if several improvements give the same increase in the objective

function, a set C that moves the fewest number of vertices will be chosen. We call the resulting

algorithm h(n)-LocalBestImpSearch.

We present analyses for the randomly initialized variants of both h(n)-LocalSearch and h(n)-

LocalBestImpSearch. Our results for h(n)-LocalSearch do not depend on which improvement

the algorithm chooses; in fact, we allow the algorithm to use an arbitrary oracle to choose

which improvement to apply. Our bounds on the expected approximation performance of


randomly initialized h(n)-LocalSearch therefore hold for any h(n)-local search algorithm and

require no further complexity assumptions. In the case of h(n)-LocalBestImpSearch, we can

make use of the fact that the algorithm must choose the best possible improvement at each

stage to obtain improved inapproximation bounds. One interesting question is whether the

general results for h(n)-LocalSearch can be improved to match these results, or, if not, whether

some particular rule for choosing improvements at each stage does, in fact, give improved

approximation performance. Finally, we show that our results also hold if Sinit is chosen by a

general greedy-like, priority algorithm.2

We now turn to the proof of our results. We use the same conventions as in the previous

section, but, since we are now considering only the special case of MaxCut, we make use of

some specific terminology to simplify the presentation. We define S = V \ S and shall refer to

a set S of variables as a cut, meaning the partition (S, S). Note that the sets S and S give the

same cut (S, S). Flipping a variable v now corresponds to moving the vertex v across the cut to

the other side of the partition. We thus talk about moving vertices across the cut rather than

flipping variables. Finally, we say that S cuts an edge (u, v) if exactly one of u or v is in S.

Our proofs make use of an infinite family of worst-case instances Ga,r, depicted in Figure

7.5. For each a ≥ 2 and 1 ≤ r < a, the graph Ga,r on a + 2 vertices is constructed as follows.

Let A be a set of a vertices, and B be a set of 2 vertices. We form a complete bipartite graph

by inserting an edge of weight one between each vertex of A and each vertex of B, then add

one additional edge of weight r connecting the two vertices of B.

In order to analyze the performance of h(n)-local search on Ga,r, we consider the value of an

arbitrary cut S. We consider the following three classes of cuts, distinguished by the position

of the vertices of B:

Case (|S ∩B| = 0):

In this case, both vertices of B are in the set S and so the edge of weight r between them

is not cut by S. Each vertex of A that is in S has both of its edges to B cut by S. The

remaining vertices of A are in S and so have none of their edges cut by S. Thus,

f(S) = 2|S ∩A| .

Case (|S ∩B| = 1):

We have exactly one vertex of B in S, so the cut S separates the vertices of B, and cuts

the edge of weight r between them. Each vertex of A has one edge of weight 1 to each of

the vertices of B. One of these vertices must be on the same side of the cut as A and one

must be on the opposite side. Thus, exactly one of the weight 1 edges incident to each

vertex of A is cut, and we have

f(S) = a+ r .

2The class of priority algorithms was first introduced by Borodin, Nielsen, and Rackoff [14]. Davis andImpagliazzo [32] and Borodin et al. [13] provide analyses of priority algorithms in the context of graph problems.


· · ·

r

Set of a vertices A

Set of 2 vertices B

Figure 7.5: The graph Ga,r

Case (|S ∩B| = 2):

This case is the same as the case |S ∩ B| = 0. By considering the cut S in the first case

above, we obtain

f(S) = 2|S ∩A| = 2a− 2|S ∩A|) .

The unique, globally optimal cut of Ga,r is the cut A (or, equivalently, V \A = B) with value

f(A) = 2a. We have just shown that the cuts that separate the vertices of B have reasonably

poor quality. The following lemma shows that many of these cuts are also h(n)-local optima in

Ga,r. As we shall show, the parameter r governs how unbalanced a cut can be with respect to

the vertices of A and still be a local optimum.

Lemma 7.5. Consider a graph Ga,r, where 2h(n) < r < a, and a cut S with

||S ∩A| − a/2| ≤ r/2− h(n) .

If furthermore |S ∩B| = 1 then S is an h(n)-local optimum, with f(S) = a+ r.

Proof. First, we note that since |S ∩B| = 1 we must have f(S) = a+ r. Let C be an arbitrary

subset of V of size at most h(n), and let S′ = S 4C. We show that for each such set we must

have f(S) ≥ f(S′). We consider 2 cases, based on how many vertices of B are contained in C:

Case (|C ∩B| 6= 1):

After moving both or none of the vertices of B across the cut, we still have |S′ ∩B| = 1,

and so f(S′) = a+ r, regardless of the positions of the other vertices.


Case (|C ∩B| = 1):

After moving one vertex of B across the cut, we have either |S′ ∩B| = 2 and so

f(S′) = 2a− 2|S′ ∩A|) ,

or |S′ ∩B| = 0 and so

f(S′) = 2|S′ ∩A| .

From the conditions of the lemma

a/2− (r/2− h(n)) ≤ |S ∩A| ≤ a/2 + (r/2− h(n)) ,

and so after moving at most h(n)− 1 < h(n) vertices of A across the cut we have

a/2− r/2 < |S′ ∩A| < a/2 + r/2 .

Thus, both 2|S′ ∩ A| < a + r and 2a − 2|S′ ∩ A| < 2a − (a − r) = a + r and so we have

f(S′) < a+ r.

Under a slightly strengthened assumption about the placement of the vertices of A, we

can derive the following stronger result, which shows that many cuts that do not satisfy both

conditions of Lemma 7.5 will satisfy them after at most one greedy improvement.

Lemma 7.6. Consider a graph Ga,r, where 2h(n) < r < a, and a cut S with

||S ∩A| − a/2| ≤ r/2− h(n)

3.

After at most one greedy improvement C the cut S′ = S 4 C satisfies both

||S′ ∩A| − a/2| ≤ r/2− h(n)

and |S′ ∩B| = 1.

Proof. First, we note that since r > 2h(n), we have r/2−h(n)3 ≤ r/2−h(n). Suppose |S∩B| = 1.

Then, S already satisfies both conditions of the lemma since

||S ∩A| − a/2| ≤ r/2− h(n)

3≤ r/2− h(n) .

Then, suppose that |S ∩ B| 6= 1, and so both vertices of B are on the same side of the cut S.

Let C be an improvement of size at most h(n). We consider 2 cases for C:

Case (|C ∩B| 6= 1):

We must have |S′ ∩B| 6= 1, since either both vertices of B are are moved across the cut,


or both are left where they are. Thus, we have either

f(S′) = 2|S′ ∩A|

or

f(S′) = 2a− 2|S′ ∩A| .

Since

a/2− r/2− h(n)

3≤ |S ∩A| ≤ a/2 +

r/2− h(n)

3,

after moving at most h(n) vertices of A, we must have

a/2− r/2− h(n)

3− h(n) ≤ |S′ ∩A| ≤ a/2 +

r/2− h(n)

3+ h(n) ,

and so

2|S′ ∩A| ≤ 2(a/2 + r/6 + 2h(n)/3)

= a+ r/3 + 4h(n)/3

< a+ r/3 + 2r/3

= a+ r ,

and also

2a− 2|S′ ∩A| ≤ 2a− 2(a/2− r/6− 2h(n)/3)

= 2a− a+ r/3 + 4h(n)/3

< a+ r/3 + 2r/3

= a+ r .

In either case, we have f(S′) < a+ r.

Case (|C ∩B| = 1):

We must have |S′ ∩ B| = 1, since one vertex of B is moved across the cut. Therefore,

f(S′) = a+ r, regardless of the position of the other vertices.

Thus, any improvement C with |C ∩ B| = 1 results in a strictly greater value of f(S′) than

any improvement with |C ∩ B| 6= 1. Furthermore, all improvements C with |C ∩ B| = 1 yield

the same value for f(S′). The algorithm h(n)-LocalBestImpSearch must pick one having the

smallest size amongst all such improvements. This is an improvement C of size 1 containing

only a single vertex of B. After applying this improvement to S, we have a cut S′ satisfying


both |S′ ∩B| = 1 and

||S′ ∩A| − a/2| = ||S ∩A| − a/2|

≤ r/2− h(n)

3

≤ r/2− h(n) .

Lemmas 7.5 and 7.6 both require that close to half of the vertices of A be included in a S.

We now consider the probability that this is the case when S is chosen uniformly at random.

A straightforward Chernoff bound gives:

Pr[||A ∩ S| − a/2| ≤ r/2− h(n)] ≥ 1− 2e−(r−2h(n))2/2a , (7.4)

and

Pr[||A ∩ S| − a/2| ≤ r/6− h(n)/3] ≥ 1− 2e−(r−2h(n))2/18a . (7.5)

Using these bounds and the results of Lemmas 7.5 and 7.6 we obtain our main results. The first

holds for any oblivious h(n)-local search algorithm h(n)-LocalSearch, while the second holds for

the best-improvement variant h(n)-LocalBestImpSearch.

Theorem 7.7. For all h = o(n), the expected approximation ratio for randomly initialized

h(n)-LocalSearch on MaxCut is at most 3/4 + o(n). Moreover, if h(n) = cn + o(n) for any

0 < c < 1/2, the expected approximation ratio is at most 3/4 + c/2 + o(n).

Proof. Let Rmc denote the approximation ratio of h(n)-LocalSearch for MaxCut. Consider the

graph Ga,r where n = a+ 2 and 2h(n) < r < a, and let Sinit be a random cut in this graph. If

||A ∩ Sinit| − a/2| ≤ r/2− h(n)

and |B∩Sinit| = 1 then Lemma 7.5 states that Sinit must be a local optimum with f(S) = a+r.

Let XA be the event ||A ∩ Sinit| − a/2| ≤ r/2− h(n) and XB be the event |B ∩ Sinit| = 1. We

have

E[Rmc] ≤ Pr[XA] Pr[XB]a+ r

2a+ (1− Pr[XA] Pr[XB]) · 1

= 1 + Pr[XA] Pr[XB]r − a

2a

= 1 + Pr[XA]r − a

2a.

Suppose that h(n) = cn + o(n) for some constant c < 1/2 (the case h = o(n) follows by

setting c = 0). Let r = 2(c+ ε)a, where ε is some positive constant smaller than 1/2− c. Then,

for all sufficiently large a (and hence sufficiently large n) we have 2h(n) < r < a, as required.


Furthermore, r − 2h(n) = Ω(a), and so from (7.4) we have

Pr[XA] ≥ 1− 2e−(r−2h(n))2/2a = 1− 2e−Ω(a) .

Then,

E[Rmc] ≤ 1 +(

1− 2e−Ω(a)) r − a

4a= 1 +

(1− 2e−Ω(a)

) 2c+ ε− 1

4.

Considering this bound for large a, we have

lima→∞

E[Rmc] ≤ lima→∞

1 +(

1− 2e−Ω(a)) 2c+ ε− 1

4=

2c+ ε+ 3

4=

c

2+

3

4+ε

4,

which holds for any sufficiently small ε > 0.

In the case of randomly initialized h(n)-LocalBestImpSearch, we obtain better bounds, which

in the case of h = o(n) match the worst-case locality ratio of h(n)-LocalSearch.

Theorem 7.8. For all h = o(n), the expected approximation ratio for randomly initialized

h(n)-LocalBestImpSearch on MaxCut is at most 1/2 + o(n). Moreover, if h(n) = cn + o(n) for

any 0 < c < 1/2, the expected approximation ratio is at most 1/2 + c+ o(n).

Proof. Let Rmc denote the approximation ratio of h(n)-LocalBestImpSearch for MaxCut. Again,

we consider the graph Ga,r where n = a+ 2 and 2h(n) < r < n, and let Sinit be a random cut

in this graph. From Lemma 7.6, applying at most 1 greedy improvement to any initial cut Sinit

of Ga,r satisfying the balance condition

||A ∩ Sinit| − a/2| ≤ (r/2− h(n))/3

results in a cut S′ satisfying

||A ∩ S′| − a/2| ≤ r/2− h(n) ,

as well ass |B ∩ S′| = 1. Lemma 7.5 states that this resulting cut S′ must be an h(n)-local

optimum with f(S′) ≤ a + r. Let XA be the event ||A ∩ Sinit| − a/2| ≤ (r/2 − h(n))/3. We

have:

E[Rmc] ≤ Pr[XA]a+ r

2a+ (1− Pr[XA]) · 1 = 1 + Pr[XA]

r − a2a

.

Suppose that h(n) = cn+ o(n) for some constant c < 1/2 (the case h = o(n) follows by setting

c = 0). Let r = 2(c + ε)a, where ε is some positive constant smaller than 1/2 − c. Then, for

all sufficiently large a (and hence sufficiently large n) we have 2h(n) < r < a, as required.

Furthermore, r − 2h(n) = Ω(a), and so from (7.5) we have

Pr[XA] ≥ 1− 2e−(r−2h(n))2/18a = 1− 2e−Ω(a) .


Then,

E[Rmc] ≤ 1 +(

1− 2e−Ω(a)) r − a

2a= 1 +

(1− 2e−Ω(a)

) 2c+ ε− 1

2.

Considering this bound for large a, we obtain

lima→∞

E[Rmc] ≤ lima→∞

1 +(

1− 2e−Ω(a)) 2c+ ε− 1

2= c+

1

2+ε

2,

which holds for any sufficiently small ε > 0.

Now, we consider the case in which Sinit is chosen by a greedy-like priority algorithm.

Specifically, we analyze the algorithm that considers the vertices of a graph in order of decreasing

weighted degree, and places each vertex on the side of the cut that maximizes the total weight

of cut edges between it and previously considered vertices.

In this case, we can modify the instance Ga,r by letting half the edges between A and B

have weight 1 + ε instead of 1, as follows. Let b1 and b2 be the vertices of B. We divide the

vertices of A into 2 equal sets A1 and A2. For each a ∈ A1, we change the weight of the edge

(a, b2) to 1 + ε, and for each a ∈ A2, we change the weight of the edge (a, b1) to 1 + ε. All other

edges are as in Ga,r.

If a+r > 2+ε, the priority algorithm we have just described must first consider the vertices

b1 and b2, placing them on opposite sides of the cut. Every other vertex is then considered,

and all the vertices of A1 are placed on the same side of the cut as b1, while all the vertices

of A2 are placed on the same side of the cut as b2. The resulting cut is completely balanced

and so from Lemma 7.5 must be a local optimum of the sort we have just considered. Thus,

h(n)-LocalSearch must terminate without applying any improvement. It follows that we cannot

approximate MaxCut beyond the locality ratio of h(n)-LocalSearch by starting with a greedy

cut of this sort.

A similar argument can also be applied to the priority algorithm that considers the edges of

a graph in increasing order of weight, and at each step places any previously unplaced vertices

of an edge in a way that cuts the edge. For all r > 1 + ε, this algorithm first considers the edge

of weight r and places b1 and b2 on opposite sides of the cut. Then, it proceeds in a similar

fashion as the previous algorithm, placing exactly half of the vertices of A on one side of the

cut and half on the other side.

Chapter 8

Conclusion

We have reconsidered the general paradigm of non-oblivious local search, in which a local search

algorithm for some problem is guided by an auxiliary potential function instead of the problem’s

stated objective. We have shown that this approach can lead to improved locality ratios by

giving extra weight to solutions that will be flexible in future iterations, and guiding the local

search out of certain local optima. Non-oblivious local search gives a tight (1− 1e )-approximation

algorithm for monotone submodular maximization subject to a matroid constraint, as well as

new, improved approximation results for several problems formulated as k-exchange systems.

In contrast, we have shown that the standard oblivious local search algorithm attains only a(12 + o(n)

)-approximation for monotone submodular maximization. Results on k-set packing

by Hurkens and Schrijver [55] and Hazan, Safra, and Schwartz [53] imply that oblivious local

search is similarly limited in the case of k-exchange systems, even when large neighborhoods

are used. We have considered a variety of possible techniques for improving oblivious local

search for CSPs, including increasing the size of neighborhood N , using random or greedy

initial solutions Sinit, choosing the best possible improvement at each stage, and even allowing

pivot to have unlimited computational power. We have provided examples in which these

modifications lead to little improvement over the locality or approximation ratio of an oblivious

local search algorithm. In contrast, the results of Alimonti [2, 3, 4] and Khanna et al. [64] show

that non-oblivious local search does yield improvements in the locality ratio for many CSPs.

All of our results thus point to the relative power of the non-oblivious paradigm, even when

compared to more exotic or computationally unrestricted variants of oblivious local search. A

major direction for future research is the identification of other areas where this approach may

yield improved approximation results. One immediate question is whether it is possible to

obtain non-oblivious local search algorithms for more complicated combinatorial optimization

problems, such as coloring and scheduling problems, in which solutions are most naturally

represented as assignments rather than sets of elements. Even in the specific settings we have

considered, there are still open questions and directions for future research.

128

Chapter 8. Conclusion 129

8.1 Monotone Submodular Maximization

As discussed in Section 2.5, the continuous greedy algorithm for monotone submodular max-

imization breaks down in the case of multiple matroid constraints because certain conditions

required by the rounding phase are no longer satisfied. Our non-oblivious local search algorithm

for monotone submodular maximization matches the approximation performance of the contin-

uous greedy algorithm in the case of a single matroid constraint and does not require a rounding

phase. Thus, it could plausibly deliver improved approximations for problems where previous

rounding-based approaches encounter difficulties. A natural area to investigate first is the case

of two or more matroid constraints. Lee, Sviridenko, and Vondrak [71] have recently improved

over long-standing approximation bounds in this case by using an oblivious local search, so it is

reasonable to expect non-oblivious techniques to be applicable as well. Finally, we ask whether

it is possible to remove or reduce the sampling used to compute our non-oblivious potential

function g for general monotone submodular functions. This would significantly decrease the

runtime required by MatroidSubmodular.

We also ask whether our results can be improved in various restricted settings. It may be

possible to obtain faster, deterministic algorithms for other particular classes of submodular

functions, such as weighted sums of matroid rank functions. In the special case of coverage

functions, the techniques of Chapters 3 and 6 might be combined to give improved approxima-

tions for k-exchange systems. Another interesting problem for future research in this direction

is submodular MaxSat. Here, we are given a submodular function f on the set of clauses,

and seek an assignment so that the value of f on satisfied clauses is maximized. Because this

problem can be formulated as an instance of submodular maximization subject to a partition

matroid constraint, we obtain a (1 − 1e )-approximation using the results of Chapter 4. Azar,

Gamzu, and Roth [9] give a 23 -approximation for this problem, which is better than our general

algorithm, and give an information-theoretic hardness of approximation bound1 of 34 . Hence,

it remains possible that some non-oblivious approach tailored to this particular problem might

result in an improved approximation algorithm.

8.2 Set Systems for Local Search

Our analysis of independence systems for local search leaves several open questions as well.

First, we ask whether it is possible to obtain a natural, exact characterization for those in-

dependence systems for which (1, k)-OblLocalSearch has a locality gap of 1k . It is possible to

obtain one such characterization by using LP duality and techniques similar to those used in

our derivation of non-oblivious potentials for maximum coverage and monotone submodular

maximization in matroids. Unfortunately, the resulting characterization is somewhat unnat-

ural, involving properties that are not appreciably simpler than a direct proof of the locality

1Azar, Gamzu, and Roth note that this result can also be obtained by a result of Mirrokni, Schapira, andVondrak for combinatorial auctions with submodular bidders.


ratio for (1, k)-OblLocalSearch.

Another area for future research is to improve the runtime of the algorithms given Chapter 6.

Our algorithms have exponential dependence on k. For the weighted independent set problem

in (k+ 1)-claw free graphs, Berman and Krysta [12] give a non-oblivious local search algorithm

whose runtime is independent of k, at the expense of a lowered approximation ratio. It should

be possible to extend their results to the general class of strong k-exchange systems.

Additionally, we ask whether the analysis of the locality ratio of our non-oblivious algorithms

for linear and monotone submodular maximization in k-exchange systems can be improved

to match the upper bounds of 2k+ε and 2

k+2+ε given by Hurkens and Schrijver [55] for the

unweighted case. As we discussed in Section 6.2, this is the best upper bound that we have for

general k, but our analysis only gives a lower bound of 2k+1+ε and 2

k+3+ε . Removing the extra

additive term of 1 would give improved approximations for several specific problems.

Finally, we ask whether the non-oblivious approach used to develop Submodular-k-Exchange

in Section 6.2 can be generalized to non-monotone submodular maximization. Feldman, Naor,

and Schwartz [39] were able to obtain improved bounds for oblivious local search in this case by

generalizing techniques applied to k-matroid intersection by Lee, Sviridenko, and Vondrak [71].

Surprisingly, almost all of the analysis from Section 6.2 can be adapted to the non-monotone

case. The proof breaks down only in the last step, where we use monotonicity to infer that

f(S ∪O) ≥ f(O). In the case of oblivious local search, Lee, Sviridenko, and Vondrak overcome

this difficulty by performing a series of iterated local searches and taking the best solution.

Unfortunately, the use of specially ordered marginals to approximate weights, which is critical

in our analysis, interferes with the analysis of the iterated approach.

8.3 Negative Results for CSPs

The negative results from Chapter 7.2 raise several questions. Specifically, we ask whether the

condition of (Y, Z)-robustness can be removed or simplified, and whether these techniques can

be generalized to larger classes of CSPs or even other problems in general. Although we have

shown many local search algorithms for MaxCut have approximation ratio no better than their

locality ratio, the relationship between these two measures is still poorly understood. Extending

our analysis of MaxCut to other randomized variants of local search, such as simulated annealing,

is another problem for future research. Additionally, we ask whether it is possible to obtain

similar negative results for certain classes of non-oblivious potential functions, as well.

Finally, we consider the gap between the expected approximation ratio of 12 , which holds for

best improvement local search, and 34 which holds without any assumptions on the algorithm’s

pivot. We ask whether this gap is due to our analysis or whether there is in fact some pivot rule

that can give a local search algorithm with approximation ratio better than 12 . For some time,

the algorithm of Goemans and Williamson [45] was the only algorithm to obtain better than a12 -approximation in arbitrary graphs. Recently, however, algorithms using spectral techniques


[91, 86, 61] have been able to deliver approximations beyond 12 for general graphs. These

algorithms utilize information about the eigenvalues of a graph to guide an iterative greedy-like

algorithm. We ask whether similar techniques can be incorporated into a non-oblivious local

search algorithm as well.

Bibliography

[1] Alexander Ageev and Maxim Sviridenko. Pipage Rounding: A New Method of Construct-

ing Algorithms with Proven Performance Guarantee. Journal of Combinatorial Optimiza-

tion, 8(3):307–328, 2004.

[2] Paola Alimonti. New local search approximation techniques for maximum generalized sat-

isfiability problems. In CIAC ’94: Proceedings of the 2nd Italian Conference on Algorithms

and Complexity, pages 40–53. Springer-Verlag, 1994.

[3] Paola Alimonti. Non-Oblivious Local Search for Graph and Hyperpraph Coloring Prob-

lems. In WG ’95: Proceedings of the 21st International Workshop on Graph-Theoretic

Concepts in Computer Science, pages 167–180. Springer-Verlag, 1995.

[4] Paola Alimonti. Non-oblivious Local Search for MAX 2-CCSP with Application to MAX

DICUT. In WG ’97: Proceedings of the 23rd International Workshop on Graph-Theoretic

Concepts in Computer Science, pages 2–14. Springer-Verlag, 1997.

[5] Esther M. Arkin and Refael Hassin. On Local Search for Weighted k-Set Packing. In ESA

’97: Proceedings of the 5th European Symposium on Algorithms, pages 13–22. Springer-

Verlag, 1997.

[6] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and

Vinayaka Pandit. Local Search Heuristics for k-Median and Facility Location Problems.

SIAM Journal on Computing, 33(3):544–562, 2004.

[7] G. Ausiello, P. Crescenzi, and M. Protasi. Approximate solution of NP optimization

problems. Theoretical Computer Science, 150(1):1–55, 1995.

[8] Giorgio Ausiello and Marco Protasi. Local search, reducibility and approximability of

NP-optimization problems. Information Processing Letters, 54(2):73–79, 1995.

[9] Yossi Azar, Iftah Gamzu, and Ran Roth. Submodular Max-SAT. In ESA ’11: Proceedings

of the 19th European Symposium on Algorithms, ESA’11, pages 323–334. Springer-Verlag,

2011.

132

Bibliography 133

[10] George A. Baker and Peter Graves-Morris. Pade Approximants. Number 59 in Ency-

clopedia of Mathematics and its Applications. Cambridge University Press, 2nd edition,

1996.

[11] Piotr Berman. A d/2 approximation for maximum weight independent set in d-claw free

graphs. Nordic Journal of Computing, 7:178–184, 2000.

[12] Piotr Berman and Piotr Krysta. Optimizing misdirection. In SODA ’03: Proceedings of the

14th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 192–201, Philadelphia,

PA, USA, 2003. SIAM.

[13] Allan Borodin, Joan Boyar, Kim S. Larsen, and Nazanin Mirmohammadi. Priority algo-

rithms for graph optimization problems. Theoretical Computer Science, 411(1):239–258,

2010.

[14] Allan Borodin, Morten N Nielsen, and Charles Rackoff. (Incremental) Priority Algorithms.

Algorithmica, 37(4):295–326, September 2003.

[15] Dietrich Braess. Nonlinear Approximation Theory, volume 7 of Springer Series in Com-

putational Mathematics. Springer-Verlag, Berlin, 1986.

[16] Richard A. Brualdi. Comments on Bases in Dependence Structures. Bulletin of the Aus-

tralian Mathematical Society, 1(02):161–167, 1969.

[17] Richard A. Brualdi. Common transversals and strong exchange systems. Journal of Com-

binatorial Theory, 8(3):307–329, 1970.

[18] Richard A. Brualdi. Induced matroids. Proceedings of the American Mathematical Society,

29:213–221, 1971.

[19] Richard A. Brualdi and Edward B. Scrimger. Exchange systems, matchings, and transver-

sals. Journal of Combinatorial Theory, 5(3):244–257, 1968.

[20] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrak. Maximizing a Submod-

ular Set Function Subject to a Matroid Constraint (Extended Abstract). In IPCO ’07:

Proceedings of the 12th International Conference on Integer Programming and Combina-

torial Optimization, pages 182–196. Springer-Verlag, 2007.

[21] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrak. Maximizing a Sub-

modular Set Function Subject to a Matroid Constraint. SIAM Journal on Computing,

40(6):1740–1766, 2011.

[22] Barun Chandra and Magnus Halldorsson. Greedy local improvement and weighted set

packing approximation. In SODA ’99: Proceedings of the 10th Annual ACM-SIAM Sym-

posium on Discrete Algorithms, pages 169–176. SIAM, 1999.

Bibliography 134

[23] Moses Charikar and Sudipto Guha. Improved Combinatorial Algorithms for Facility Lo-

cation Problems. SIAM Journal on Computing, 34(4):803–824, 2005.

[24] Chandra Chekuri and Amit Kumar. Maximum coverage problem with group budget con-

straints and applications. In APPROX ’04: Proceedings of the 7th International Work-

shop on Approximation Algorithms for Combinatorial Optimization Problems, pages 72–83,

2004.

[25] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Dependent randomized rounding

via exchange properties of combinatorial structures. In FOCS ’10: Proceedings of the 51st

IEEE Symposium on Foundations of Computer Science, pages 575–584. IEEE Computer

Society, 2010.

[26] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Multi-budgeted matchings and ma-

troid intersection via dependent rounding. In SODA ’11: Proceedings of the 22nd Annual

ACM-SIAM Symposium on Discrete Algorithms, pages 1080–1097. SIAM, 2011.

[27] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Submodular function maximization

via the multilinear relaxation and contention resolution schemes. In STOC ’11: Proceedings

of the 43rd ACM Symposium on Theory of Computing, pages 783–792. ACM, 2011.

[28] Reuven Cohen and Liran Katzir. The generalized maximum coverage problem. Information

Processing Letters, 108(1):15–22, 2008.

[29] Michele Conforti and Gerard Cornuejols. Submodular set functions, matroids and the

greedy algorithm: Tight worst-case bounds and some generalizations of the rado-edmonds

theorem. Discrete Applied Mathematics, 7(3):251–274, 1984.

[30] Gerard Cornuejols, Marshall L. Fisher, and George L. Nemhauser. Location of Bank

Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms.

Management Science, 23(8):789–810, 1977.

[31] William H Cunningham. Testing membership in matroid polyhedra. Journal of Combina-

torial Theory, Series B, 36(2):161–188, 1984.

[32] Sashka Davis and Russell Impagliazzo. Models of Greedy Algorithms for Graph Problems.

Algorithmica, 54(3):269–317, 2009.

[33] Jack Edmonds. Matroids and the greedy algorithm. Mathematical Programming, 1(1):127–

136, 1971.

[34] Jack Edmonds. Matroid intersection. In P.L. Hammer, E.L. Johnson, and B.H. Korte, ed-

itors, Discrete Optimization I: Proceedings of the Advanced Research Institute on Discrete

Optimization and Systems Applications of the Systems Science Panel of NATO and of the

Bibliography 135

Discrete Optimization Symposium, volume 4 of Annals of Discrete Mathematics, pages 39

– 49. Elsevier, 1979.

[35] Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45:634–

652, 1998.

[36] Moran Feldman, Joseph (Seffi) Naor, and Roy Schwartz. Nonmonotone submodular max-

imization via a structural continuous greedy algorithm. In ICALP ’11: Proceedings of

the 38th International Colloquium Conference on Automata, Languages and Programming,

pages 342–353. Springer-Verlag, 2011.

[37] Moran Feldman, Joseph (Seffi) Naor, and Roy Schwartz. A unified continuous greedy

algorithm for submodular maximization. In FOCS ’11: Proceedings of the 52nd IEEE

Symposium on Foundations of Computer Science, pages 570–579. IEEE Computer Society,

2011.

[38] Moran Feldman, Joseph (Seffi) Naor, and Roy Schwartz. A tight linear time (1/2)-

approximation for unconstrained submodular maximization. In FOCS ’12: Proceedings

of the 53rd IEEE Symposium on Foundations of Computer Science, 2012. Forthcoming.

[39] Moran Feldman, Joseph (Seffi) Naor, Roy Schwartz, and Justin Ward. Improved approxi-

mations for k-exchange systems. In ESA ’11: Proceedings of the 19th European Symposium

on Algorithms, pages 784–798. Springer-Verlag, 2011.

[40] Yuval Filmus and Justin Ward. The Power of Local Search: Maximum Coverage over a

Matroid. In STACS ’12: 29th International Symposium on Theoretical Aspects of Computer

Science, pages 601–612. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2012.

[41] Yuval Filmus and Justin Ward. A tight combinatorial algorithm for submodular maxi-

mization subject to a matroid constraint. CoRR, abs/1204.4526, 2012.

[42] Yuval Filmus and Justin Ward. A tight combinatorial algorithm for submodular maxi-

mization subject to a matroid constraint. In FOCS ’12: Proceedings of the 53rd IEEE

Symposium on Foundations of Computer Science. IEEE Computer Society, 2012. Forth-

coming.

[43] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximations for

maximizing submodular set functions—II. Mathematical Programming Studies, 8:73–87,

1978.

[44] Shayan Oveis Gharan and Jan Vondrak. Submodular maximization by simulated anneal-

ing. In SODA ’11: Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete

Algorithms, pages 1098–1117. SIAM, 2011.

Bibliography 136

[45] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for

maximum cut and satisfiability problems using semidefinite programming. Journal of the

ACM, 42(6):1115–1145, 1995.

[46] Pranava R. Goundan and Andreas S. Schulz. Revisiting the Greedy Approach to Submod-

ular Set Function Maximization. Preprint, 2007.

[47] Fabrizio Grandoni and Rico Zenklusen. Approximation schemes for multi-budgeted inde-

pendence systems. In ESA ’10: Proceedings of the 18th European Conference on Algo-

rithms: Part I, pages 536–548. Springer-Verlag, 2010.

[48] Curtis Greene and Thomas L. Magnanti. Some Abstract Pivot Algorithms. SIAM Journal

on Applied Mathematics, 29(3):530–539, 1975.

[49] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-

monotone submodular maximization: Offline and secretary algorithms. In WINE 2010:

Internet and Network Economics, pages 246–257. Springer, 2010.

[50] Magnus Halldorsson. Approximating discrete collections via local improvements. In SODA

’95: Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, pages

160–169. SIAM, 1995.

[51] D. Hausmann and B. Korte. k-greedy algorithms for independence systems. Mathematical

Methods of Operations Research, 22:219–228, 1978. 10.1007/BF01917662.

[52] D. Hausmann, B. Korte, and T. A. Jenkyns. Worst case analysis of greedy type algorithms

for independence systems. Mathematical Programming Studies, 12:120–131, 1980.

[53] Elad Hazan, Shmuel Safra, and Oded Schwartz. On the complexity of approximating k-set

packing. Computational Complexity, 15:20–39, 2006.

[54] Dorit S. Hochbaum and Anu Pathria. Analysis of the greedy approach in covering problems.

Naval Research Quarterly, 45:615–627, 1998.

[55] C. A. J. Hurkens and A. Schrijver. On the size of systems of sets every t of which have an

SDR, with an application to the worst-case ratio of heuristics for packing problems. SIAM

Journal of Discrete Mathematics, 2(1):68–72, 1989.

[56] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. A combinatorial strongly polynomial

algorithm for minimizing submodular functions. Journal of the ACM, 48(4):761–777, July

2001.

[57] Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V. Vazi-

rani. Greedy facility location algorithms analyzed using,dual fitting with factor-revealing

LP. Journal of the ACM, 50(6):795–824, 2003.

Bibliography 137

[58] T. A. Jenkyns. The efficacy of the ”greedy” algorithm. In Proceedings of the 7th Southeast-

ern Conference on Combinatorics, Graph Theory, and Computing, pages 341–350. Utilitas

Mathematica, 1976.

[59] Per M. Jensen and Bernhard Korte. Complexity of Matroid Property Algorithms. SIAM

Journal on Computing, 11(1):184, 1982.

[60] David S. Johnson, Christos H. Papadimitriou, and Mihalis Yannakakis. How easy is local

search? Journal of Computer and Systems Sciences, 37(1):79–100, 1988.

[61] Satyen Kale and C. Seshadhri. Combinatorial approximation algorithms for maxcut using

random walks. In Proceedings of the 2nd Symposium on Innovations in Computer Science

(ICS 2011), pages 367–388. Tsinghua University Press, 2011.

[62] Haim Kaplan, Moshe Lewenstein, Nira Shafrir, and Maxim Sviridenko. Approximation

algorithms for asymmetric tsp by decomposing directed regular multigraphs. Journal of

the ACM, 52(4):602–626, July 2005.

[63] S. Khanna, R. Motwani, M. Sudan, and U. Vazirani. On Syntactic Versus Computational

Views of Approximability. In SFCS ’94: Proceedings of the 35th Symposium on Founda-

tions of Computer Science, pages 819–830. IEEE Computer Society, 1994.

[64] S. Khanna, R. Motwani, M. Sudan, and U. Vazirani. On syntactic versus computational

views of approximability. SIAM Journal on Computing, 28(1):164–191, 1998.

[65] Samir Khuller, Anna Moss, and Joseph (Seffi) Naor. The budgeted maximum coverage

problem. Information Processing Letters, 70:39–45, April 1999.

[66] Bernhard Korte and Dirk Hausmann. An Analysis of the Greedy Heuristic for Indepen-

dence Systems. In P. Hell B Alspach and D.J. Miller, editors, Algorithmic Aspects of

Combinatorics, pages 65–74. Elsevier, 1978.

[67] Lukasz Kowalik and Marcin Mucha. Deterministic 7/8-approximation for the metric max-

imum tsp. Theoretical Computer Science, 410(47-49):5000–5009, November 2009.

[68] Eugene L. Lawler. Matroid intersection algorithms. Mathematical Programming, 9(1):31–

56, 1975.

[69] Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone

submodular maximization under matroid and knapsack constraints. In STOC ’09: Pro-

ceedings of the 41st ACM Symposium on Theory of Computing, pages 323–332. ACM,

2009.

[70] Jon Lee, Maxim Sviridenko, and Jan Vondrak. Matroid matching: the power of local

search. In STOC ’10: Proceedings of the 42nd ACM Symposium on Theory of Computing,

pages 369–378. ACM, 2010.

Bibliography 138

[71] Jon Lee, Maxim Sviridenko, and Jan Vondrak. Submodular Maximization over Multi-

ple Matroids via Generalized Exchange Properties. Mathematics of Operations Research,

35(4):795–806, 2010.

[72] L Lovasz. Matroid matching and some applications. Journal of Combinatorial Theory,

Series B, 28(2):208–236, 4 1980.

[73] L. Lovasz. The matroid matching problem. In Laszlo. Lovasz and Vera T. Sos, editors,

Algebraic Methods in Graph Theory (Colloquium Szeged 1978), volume II, pages 495–517,

Amsterdam, 1981.

[74] Julian Mestre. Greedy in approximation algorithms. In ESA ’06: Proceedings of the 14th

European Symposium on Algorithms, pages 528–539. Springer-Verlag, 2006.

[75] G. L. Nemhauser and L. A. Wolsey. Best Algorithms for Approximating the Maximum of

a Submodular Set Function. Mathematics of Operations Research, 3(3):177–188, 1978.

[76] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for max-

imizing submodular set functions—I. Mathematical Programming, 14(1):265–294, 1978.

[77] James B. Orlin, Abraham P. Punnen, and Andreas S. Schulz. Approximate local search

in combinatorial optimization. In SODA ’04: Proceedings of the 15th Annual ACM-SIAM

Symposium on Discrete Algorithms, pages 587–596. SIAM, 2004.

[78] Henri Pade. Sur la representation approchee d’une fonction par des fractions rationnelles.

Annales Scientifiques de l’Ecole Normale Superieure, Ser. 3, 9:3–93 (supplement), 1892.

[79] C. H. Papadimitriou, A A Schaffer, and M. Yannakakis. On the complexity of local search.

In STOC ’90: Proceedings of the 22nd ACM Symposium on Theory of Computing. ACM,

1990.

[80] Christos H. Papadimitriou and Mihalis Yannakakis. The traveling salesman problem with

distances one and two. Mathematics of Operations Research, 18(1):1–11, February 1993.

[81] R. Rado. Note on Independence Functions. Proceedings of the London Mathematical

Society, s3-7(1):300–320, 1957.

[82] Sartaj Sahni. Approximate algorithms for the 0/1 knapsack problem. Journal of the ACM,

22(1):115–124, January 1975.

[83] Alejandro Schaffer and Mihalis Yannakakis. Simple Local Search Problems That Are Hard

to Solve. SIAM Journal on Computing, 20(1):56–87, 1991.

[84] Alexander Schrijver. A combinatorial algorithm minimizing submodular functions in

strongly polynomial time. Journal of Combinatorial Theory, Series B, 80(2):346–355,

2000.

Bibliography 139

[85] Alexander Schrijver. Combinatorial Optimization: Polyhedra and Efficiency. Springer,

2003.

[86] Jose Soto. Improved Analysis of a Max Cut Algorithm Based on Spectral Partitioning.

arXiv.org, cs.DS, October 2009.

[87] Jose A. Soto. A simple PTAS for Weighted Matroid Matching on Strongly Base Orderable

Matroids. Electronic Notes in Discrete Mathematics, 37(0):75–80, 2011.

[88] Aravind Srinivasan. Distributions on level-sets with applications to approximation al-

gorithms. In FOCS ’01: Proceedings of the 42nd IEEE Symposium on Foundations of

Computer Science, pages 588–599. IEEE Computer Society, 2001.

[89] Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsack

constraint. Operations Research Letters, 32(1):41–43, January 2004.

[90] Po Tong, Eugene L. Lawler, and V. V. Vazirani. Solving the Weighted Parity Problem

for Gammoids by Reduction to Graphic Matching. Technical Report UCB/CSD-82-103,

University of California, Berkeley, April 1982.

[91] Luca Trevisan. Max cut and the smallest eigenvalue. In STOC ’09: Proceedings of the

41st ACM Symposium on Theory of Computing, pages 263–272. ACM, 2009.

[92] C. Underhill and A. Wragg. Convergence properties of pade approximants to exp(z) and

their derivatives. IMA Journal of Applied Mathematics, 11(3):361–367, 1973.

[93] Jan Vondrak. Optimal approximation for the submodular welfare problem in the value

oracle model. In STOC ’08: Proceedings of the 40th ACM Symposium on Theory of

Computing, pages 67–74. ACM, 2008.

[94] Jan Vondrak. Symmetry and Approximability of Submodular Maximization Problems. In

FOCS ’09: Proceedings of the 50th IEEE Symposium on Foundations of Computer Science,

pages 651–670. IEEE Computer Society, 2009.

[95] Jan Vondrak. Submodularity and curvature: the optimal algorithm. In RIMS Kokyuroku

Bessatsu, volume B23, pages 253–266, Kyoto, 2010.

[96] Jens Vygen. A note on schrijver’s submodular function minimization algorithm. Journal

of Combinatorial Theory, Series B, 88(2):399–402, 2003.

[97] Justin Ward. A (k+ 3)/2-approximation algorithm for monotone submodular k-set pack-

ing and general k-exchange systems. In STACS ’12: 29th International Symposium on

Theoretical Aspects of Computer Science, pages 42–53. Schloss Dagstuhl–Leibniz-Zentrum

fuer Informatik, 2012.

Bibliography 140

[98] Hassler Whitney. On the Abstract Properties of Linear Dependence. American Journal of

Mathematics, 57(3):509–533, 1935.

List of Algorithms

1 GenLocalSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 SubmodularGreedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 PartEnum(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 k-LocalCoverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6 MatroidCoverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7 MatroidSubmodular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

8 CurvatureEnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

9 (a, r)-OblLocalSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

10 Linear-k-Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

11 Submodular-k-Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

12 h(n)-LocalSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

141

oblivious and non-oblivious local search for combinatorial ... · de nition 1.1 (combinatorial...

Documents