ma4254 discrete optimization
DESCRIPTION
Course Notes for NUS MA4254 Discrete OptimizationTRANSCRIPT
-
MA4254: Discrete Optimization
Defeng Sun
Department of Mathematics
National University of Singapore
Office: S14-04-25
Telephone: 6516 3343
Aims/Objectives: Discrete optimization deals with problems of max-
imizing or minimizing a function over a feasible region of discrete struc-
ture. These problems come from many fields like operations research,
management science, and computer science. The primary objective
of this course is twofold: a) to study key techniques to separate easy
problems from difficult ones and b) to use typical methods to deal with
difficult problems.
Mode of Evaluation: Tutorial class performance (10%); Mid-Term
test (20%) and Final examination (70%)
This course is taught at Department of Mathematics, National Uni-versity of Singapore, Semester I, 2009/2010.E-mail: [email protected]
1
-
2References:
1) D. Bertsimas and J. N. Tsitsiklis, Introduction to Linear Optimiza-
tion. Athena Scientific, 1997.
2) G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial
Optimization. John Wiley and Sons, 1999.
3) C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization:
Algorithms and Complexity. Prentice-Hall, 1982. Second edition by
Dover, 1998.
PARTIAL lecture notes will be made available in my webpage
http://www.math.nus.edu.sg/ matsundf/
-
Discrete Optimization 3
1 Introduction
In this Chapter we will briefly discuss the problems we are going to study; give a
short review about simplex methods for solving linear programming problems and
introduce some basic concepts in graphs and digraphs.
1.1 Linear Programming (LP): a short re-view
Consider the following linear programming
(P )
min cTx
s.t. Ax bx 0
and its dual
(D)
max bTy
s.t. ATy cy 0 .
Simplex Method Dantzig (1947) Very efficient Not polynomial time algorithm. Klee and Minty (1972) gave an counterex-
ample.
Average analysis versus worst-case analysis Russians Ellipsoid Method Polynomial time algorithm (Khachiyan, 1979) Less efficient
Interior-Point Algorithms Karmarkar (1984) Polynomial times algorithm Efficient for some large-scale sparse LPs
Others
-
41.2 Discrete Optimization (DO)
Also Combinatorial Optimization (CO)
Mathematical formula in general:
min (x)
s.t. x F x decision policy F is the collection of feasible decision policies (x) measures the value of members of F .A typical DO (CO) problem:
(IP )
min cTx
s.t. Ax bx 0
xj integer for j I N := {1, , n}.
where c
-
Discrete Optimization 5
2. The Assignment Problem
n people and m jobs, where n m Each job must be assigned to exactly one person, and each person can do at
most one job
The cost of person j doing job i is cij.Then the Assignment Problem can be formulated as
minmi=1
nj=1
cijxij
s.t.nj=1
xij = 1, i = 1, ,mmi=1
xij 1, j = 1, , n
x Bmn .
Extensions Three-Index Assignment Problem
3. Set-Covering, Set-Packing, and Set-Partitioning Problems
The Set-Covering Problem is
min cTx
s.t. Ax 1x Bn .
The Set-Packing Problem is
max cTx
s.t. Ax 1x Bn .
4. Traveling Salesman Problem (TSP)
-
6We are given a set of nodes V = {1, , n} and a set of arcs A. The nodes representcities, and the arcs represent ordered pairs of cities between which direct travel is
possible.
For (i, j) A, cij is the direct travel time from city i to city j.The TSP is to find a tour, starting at city 1, that
(a) visits each other city exactly once and then returns to city 1, and
(b) takes the least total travel time.
5. Facility Location Problem, Network Flow Problem, and many more
1.4 Why DO (CO) difficult
Arrangements grow exponentially is the superficial reason.
Total Unimodularity (TU) Theory; Shortest Path; Matroids and Greedy Algo-
rithm; Complexity (P 6= NP conjecture); Interior-Point Algorithms; Cutting Plane;Branch and Bound; Decomposition; Flowshop Scheduling, etc.
1.5 Convex sets
In linear programming and nonlinear programming, we have already met many con-
vex sets. For examples, the line segment between two points in
-
Discrete Optimization 7
1.6 Hyperplanes and half spaces
Definition 1.2 Let a be a nonzero vector in
- 8Let x0 be any point on the hyperplane {x
-
Discrete Optimization 9
It is noted that these halfspaces are finite in number. The intersection of two poly-
hedrons is again a polyhedron. So {x
-
10
over which we are optimizing. There are quite a number of different but equivalent
ways to define the concept of a corner. Here we introduce two of them exreme
points and basic feasible solutions.
Our first definition defines an extreme point of a polyhedron as a point that can
not be expressed as a convex combination of two other points of the polyhedron.
Definition 1.6 Let P
-
Discrete Optimization 11
where a1 = (0, 0, 2)T , a2 = (4, 0, 0)
T and a3 = (1, 1, 1)T . Let a4 = e1, a5 = e2 and
a6 = e3. Then
M1 = {1, 4, 5, 6}, M2 = {2}, M3 = {3}.
Definition 1.7 If a vector x satisfies aTi x = bi for some i M1,M2 or M3, we
say that the corresponding constraint is active or binding at x. The active set
of P at x is defined as
I(x) = {i M1 M2 M3 | aTi x = bi},
i.e., I(x) is the set of indices of constraints that are active at x.
For example, suppose that P is defined by (1.1). Let x = (0.5, 0, 0.5)T . All
active constraints at x are
aT1 x 1, aT3 x = 1, aT5 x(= x2) 0
and
I(x) = {1, 3, 5}.
Recall that vectors x1, . . . , xk
-
12
(a) There exist n vectors in the set {ai | i I(x)}, which are linearly independent.
(b) The span of the vectors ai, i I(x), is all of
-
Discrete Optimization 13
which is orthogonal to the subspace spanned by these vectors. If x satisfies aTi x = bi
for all i I(x), we also have aTi (x+d) = bi for all i I(x), thus obtaining multiplesolutions. We have therefore established that (b) and (c) are equivalent. Q.E.D.
With a slight abuse of language, we will often say that certain constraints are
linearly independent, meaning that the corresponding vectors ai are linearly inde-
pendent. We are now ready to provide an algebraic definition of a corner point of
the polyhedron P .
Definition 1.8 Let x
-
14
Note that if the number m of constraints used to define a polyhedron P
-
Discrete Optimization 15
The set P = {x
-
16
Definition 1.11
(a) A nonzero element d of a polyhedral cone C
-
Discrete Optimization 17
1.10 Simplex Method Revisited
Consider the standard linear programming problem
(P )
min cTx
s.t. Ax = b,
x 0,(1.2)
where A
-
18
Finding an initial basic feasible solution: The artificial variables method and the
bigM method.
For the dual simplex method, we have
0 c1 . . . cn
b1 | |... A1 . . . An
bm | |
and
cTBxB c1 . . . cnxB(1) | |... B1A1 . . . B1An
xB(m) | |We do not require B1b to be nonnegative, which means that we have a basic,
but not necessarily feasible solution to the primal problem. However, we assume
that c 0; equivalently, the vector yT = cTBB1 satisfies yTA cT , and we havea feasible solution to the dual problem. The cost of this dual feasible solution is
yT b = cTBB1b = cTBxB, which is the negative of the entry at the upper left corner of
the tableau.
-
Discrete Optimization 19
1.11 Graphs and Digraphs
1.11.1 Graphs
Definition 1.12 A graph G is a pair (V,E), where V is a finite set and E is a set
of unordered pairs of elements of V . Elements of V are called vertices and elements
of E edges. We say that a pair of distinct vertices are adjacent if they define an
edge, and that the edge is said to be incident to its defining vertices. The degree of a
vertex v (denoted deg(v)) is the number f edges incident to that vertex.
An Example.
e1
e3 e2e4
Figure 1.3: A Graph
Definition 1.13 An v1vk-path (or path connecting v1 and vk) is a sequence of edges
v1v2, . . . , vi1vi, . . . , vk1vk.
A cycle is a sequence of edges
v1v2, . . . , vi1vi, . . . , vk1vk, vkv1.
In both cases vertices are all distinct. A graph is acyclic if it has no cycle.
Proposition 1.2 If every vertex of G has degree of at least two then G has a cycle.
Proof. Let P = v1v2, . . . , vk1vk be a path of G with a maximum number of edges.
Since deg(vk) 2, there is an edge vkw where w 6= vk1. It follows from the choiceof P that w is a vertex of P , i.e., w = vi for some i {1, . . . , k 2}. Thenvivi+1, . . . , vk1vk, vkvi is a cycle. Q.E.D.
-
20
Definition 1.14 G is connected if each pair of vertices is connected by a path.
Proposition 1.3 Let G be a connected graph with a cycle C and let e be an edge of
C. Then G e is connected.
Proof. Let v1, v2 be vertices of G e. We need to show there exists a v1v2-path P of G e. Since G is connected there exists a v1v2-path P of G. If P does not use ethen we are done. Otherwise P implies there exists a v1w1-path P1 and a w2v2-path
P2, where w1, w2 are endpoints of e. Moreover, C w1w2 is a w1w2-path. The resultnow follows. Q.E.D.
Definition 1.15 H is a subgraph of G if V (H) V (G) and E(H) E(G). It is aspanning subgraph if in addition V (H) = V (G).
Definition 1.16 A tree is a connected acyclic graph.
Theorem 1.2 If T = (V,E) is a tree, then |E| = |V | 1.
Proof. Let us proceed by induction of the number of vertices of V . The base
case |V | = 1 is trivial since then |E| = 0. Assume now |V | 2 and suppose thetheorem holds for all trees with |V | 1 vertices. Since T is acyclic, it follows formProposition 1.2 that there is a vertex v with deg(v) 1. Since T is connected and|V | 2, deg(v) 6= 0. Thus, there is a unique uv incident to v. Let T be defined asfollows V (T ) = V {v} and E(T ) = E {uv}. Observe that T is a tree. Henceby induction |E(T )| = |V (T )| 1 and it follows |E| = |V | 1. Q.E.D.
Proposition 1.4 Let G = (V,E) be a connected graph. Then |E| |V | 1. More-over, if equality holds then G is a tree.
Proof. If G has a cycle then remove from G any edge on the cycle. Repeat until
the resulting graph T is acyclic. It follows from Proposition 1.3 that T is connected.
Hence T is a tree and by Theorem 1.2,
|E(G)| |E(T )| = |V (G)| 1.
Q.E.D.
-
Discrete Optimization 21
1.11.2 Bipartite Graph
G = (S, T,E): For any edge in E with one vertex in S and the other in T .
1.11.3 Vertex-Edge Incidence Matrix
Definition 1.17 The vertex-edge incidence matrix of a graph G = (V,E) is a
matrix A with |V | rows and |E| columns whose entries are either 0 or 1 such that The rows correspond to the vertices of G, The columns correspond to the edges of G, and the entry Av,ij for vertex v andedge ij is given by
Av,ij =
0 if v 6= i and v 6= j1 if v = i or j.1.11.4 Digraphs (Directed Graphs)
Definition 1.18 A directed graph (or digraph) D is a pair (N,A) where N is a finite
set and A is a set of ordered pairs of elements of N . Elements of N are called nodes
and elements of A arcs. Node i is the tail (resp. head) of arc ij. The in-degree (resp.
out-degree) of node v (denoted deg+(v) (resp. deg(v)) is the number of arcs with
head (resp. tail) v.
1.11.5 Bipartite Digraph
D = (S, T,A)
1.11.6 Node-Arc Incidence Matrix
Definition 1.19 The node-arc incidence matrix of a graph D = (N,A) is a
matrix M with |V | rows and |A| columns whose entries are either 0, +1, or 1 suchthat
The rows correspond to the nodes of D, The columns correspond to the arcs of D, and the entry Mv,ij for node v and arc
-
22
ij is given by
Mv,ij =
0 if v 6= i and v 6= j+1 ifv = j, and
1 if v = i.
-
Discrete Optimization 23
2 Total Unimodularity (TU) and Its Applications
In this section we will discuss the total unimodularity theory and its applications to
flows in networks.
2.1 Total Unimodularity: Definition and Properties
Consider the following integer linear programming problem
(P )
max cTx
s.t. Ax = b
x 0(2.1)
where A Zmn, b Zm and C Zn all integers.
Definition 2.1 A square, integer matrix B is called unimodular if |Det(B)| = 1.An integer matrix A is called totally unimodular if every square, nonsingular
submatrix of A is unimodular.
The above definition means that a TU matrix is a {1, 0,1}-matrix. But, a{1, 0,1}-matrix may not necessarily a TU matrix, e.g.,
A =
1 11 1
Lemma 2.1 Suppose that A Znn is a unimodular matrix and that b Zn isan integer vector. If A is nonsingular, then Ax = b has the unique integer solution
x = A1b.
Proof. Let aij be the ij-th entry of A, i, j = 1, . . . , n. For any aij, define the cofactor
of aij as
Cof(aij) = (1)i+jDet(A{1,...,n}\{j}{1,...,n}\{i} ),
where (A{1,...,n}\{j}{1,...,n}\{i} ) is the matrix obtained by removing the i-th row and the j-th
column of A. Then
Det(A) =ni=1
ai1 Cof(ai1).
-
24
The Adjoint of A is
Adj(A) = Adj({aij}) = {Cof(aij)}T
and the inverse of A is
A1 =1
Det(A)Adj(A).
Since A Znn is a unimodular nonsingular integer matrix, every Cof(aij) is aninteger and Det(A) = 1. Hence A1 is an integer matrix and x = A1b is integerwhenever b is. Q.E.D.
Theorem 2.1 If A is TU, every basic solution to P is integer.
Proof. Suppose that x is a basic solution to P . Let N be the set of indices of x such
that xj = 0. Since x is a basic solution to P , there exist two nonnegative integers p
and q with p+ q = n and indices B(1), . . . , B(p) {1, . . . ,m} and N(1), . . . , N(q) N such that
{ATB(i)}pi=1 {eTN(j)}qj=1are linearly independent, where eN(j) is the N(j)-th unit vector in
-
Discrete Optimization 25
Proposition 2.3 A Zmn is TU = (A I) is TU, where I
-
26
Obviously, |Det(B)| = |Det(B)| and
|Det(B)| = |Det(A1)||Det(I )| = |Det(A1)|.
Now A is totally unimodular implies |Det(A1)| = 0 or 1 and since B is assumed tobe nonsingular, |Det(B)| = 1. Again, from Lemma 2.1, yB is an integer. Hence y isinteger because yj = 0, j / B. This implies that x is integer. [One may also makeuse of Theorem 2.1 and Proposition 2.3 to get the proof immediately.]
(2 3).Let B Zpp be any square nonsingular submatrix of A. It is sufficient to prove
that bj is an integer vector, where bj is the jth column of B1, j = 1, . . . , p.
Let t be an integer vector such that t + bj > 0 and bB(t) = Bt + ej, where ej is
the jth unit vector. Then
xB = B1bB(t) = B1(Bt+ ej) = t+B1ej = t+ bj > 0.
By choosing bN (N = {1, . . . , n}\B) sufficiently large such that (Ax)j < bj,j N , where xj = 0, j N . Hence x is an extreme point of S(b(t)). As xB and tare integer vectors, bj is an integer vector too for j = 1, . . . , p and B
1 is an integer.
(3 1).Let B be an arbitrary square, nonsingular submatrix of A. Then
1 = |Det(I)| = |Det(BB1)| = |Det(B)||Det(B1)|.
By the assumption, B and B1 are integer matrices. Thus
|Det(B)| = |Det(B1)| = 1,
and A is TU. Q.E.D.
-
Discrete Optimization 27
Theorem 2.3 (A sufficient condition of TU) An integer matrix A with all aij = 0, 1,
or 1 is TU if
1. no more than two nonzero elements appear in each column,
2. the rows of A can be partitioned into two subsets M1 and M2 such that
(a) if a column contains two nonzero elements with the same sign, one element
is in each of the subsets,
(b) if a column contains two nonzero elements of opposite signs, both elements
are in the same subset.
Proof. The proof is by induction. One element submatrix of A has a determinant
equal to (0, 1,1).Assume that the theorem is true for all submatrices of A of order k 1 or less.
If B contains a column with only one nonzero element, we expand Det(B) by that
column and apply the induction hypothesis.
Finally, consider the case in which every column of B contains two nonzero ele-
ments. Then from 2(a) and 2(b) for every column jiM1
bij =iM2
bij, j = 1, . . . , k.
Let bi be the ith row. Then the above equality givesiM1
bi iM2
bi = 0,
which implies that {bi}, i M1 M2 are linearly dependent and thus B is singular,i.e., Det(B) = 0. Q.E.D.
Corollary 2.1 The vertex-edge incidence matrix of a bipartite graph is TU.
Corollary 2.2 The node-arc incidence matrix of a digraph is TU.
-
28
2.2 Applications
In this section we show that the assumptions in Theorems in Section 2.1 for integer
programming problems connected with optimization of flows in networks are fulfilled.
This means that these problems can be solved by the SIMPLEX METHOD.
However, it is not necessarily to use the simplex method because more efficient
methods have been developed by taking into consideration the specific structure of
these problems.
Many commodities, such as gas, oil, etc., are transported through networks in which
we distinguish sources, intermediate transportation or distribution points and desti-
nation points.
We will represent a network as a directed graph G = (V,E) and associate with
each arc (i, j) E the flow xij of the commodity and the capacity dij (possiblyinfinite) that bounds the flow through the arc. The set V is partitioned into three
sets:
V1 set of sources or origins, V2 set of intermediate points, V3 set of destinations or sinks.
V231V
V
Figure 2.1: A network
-
Discrete Optimization 29
For each i V1, let ai be a supply of the commodity and for each i V3, let bi be ademand for the commodity.
We assume that there is no loss of the flow at intermediate points. Additionally,
denote V (i) (V (i)) as
V (i) = {j| (i, j) E} and V (i) = {j| (j, i) E},
respectively.
Then the minimum cost capacitated problem may be formulated as
(P) v(P ) = min
(i,j)Ecijxij
subject to
jV (i)
xij
jV (i)xji
ai, i V1,= 0, i V2, bi, i V3,
(2.2)
0 xij dij, (i, j) E. (2.3)
Constraint (2.2) requires the conservation of flow at intermediate points, a net flow
into sinks at least as great as demanded, and a net flow out of sources equal or less
than the supply. In some applications, demand must be satisfied exactly and all of
the supply must be used. If all of the constraints of (2.2) are equalities, the problem
has no feasible solutions unless
iV1
ai =iV3
bi.
To avoid pathological cases, we assume for each cycle in the network G = (V,E)
either that the sum of costs of arcs in the cycle is positive or that the minimal
capacity of an arc in the cycle is bounded.
Theorem 2.4 The constraint matrix corresponding to (2.2) and (2.3) is totally uni-
modular.
-
30
Proof. The constraint matrix has the form
A =
A1I
,where A1 is the matrix for (2.2) and I is an identity matrix for (2.3). In the last
section, we show that A1 is totally unimodular implies that A is totally unimodular.
Each variable xij appears in exactly two constraints of (2.2) with coefficients +1
or 1. Thus A1 is an incidence matrix for a digraph and therefore it is totallyunimodular. Q.E.D.
The most popular case of P is the so-called (capacitated) transportation prob-
lem. We obtain it if we put in P : V2 = , V (i) = for all i V1 and V (i) = forall i V3.So we get
(TP)
v(T ) = min
(i,j)Ecijxij,
s.t.jV (i)
xij ai, i V1,
jV (i)
xji bi, i V3,
0 xij dij, (i, j) E.
If dij = for all (i, j) E, the uncapacitated version of P is sometimes calledthe transshipment problem.
-
Discrete Optimization 31
If all ai = 1, and all bi = 1, and additionally, |V1| = |V3|, the transshipmentproblem reduces to the so-called assignment problem of the form
(AP)
v(AP ) = miniV1
jV (i)
cijxij,
s.t.jV (i)
xij = 1, i V1,
jV (i)
xji = 1, i V3,
xij 0.
Note that |V1| = |V3| implies that all constraints in (AP) must be satisfied as equal-ities.
Let V = {1, . . . ,m}. Still another important practical problem obtained fromP is called the maximum flow problem. In this problem, V1 = {1}, V3 = {m},V (1) = , V (m) = , a1 =, bm =.
The problem is to maximize the total flow into the vertex m under the capacity
constraints
(MF)
v(MF ) = max
iV (m)xim,
s.t.jV (i)
xij
jV (i)xji = 0,
i V2 = {2, . . . ,m 1},
0 xij dij, (i, j) E.
Finally, consider the shortest path problem. Let cij be interpreted as the
length of edge (i, j). Define the length of a path in G to be the sum of the edge
lengths over all edges in the path. The objective is to find a path of minimum length
-
32
from a vertex 1 to vertex m. It is assumed that all cycles have nonnegative length.
This problem is a special case of the transshipment problem in which V1 = {1},V3 = {m}, a1 = 1 and bm = 1.
Let A be the incidence matrix of the digraph G = (V,E), where V = {1, . . . ,m}and E = {e1, . . . , en}. With each arc ej we associate its length cj 0 and its flowxj 0. The shortest path problem may be formulated as:
(SP)
v(SP ) = minnj=1
cjxj,
s.t. Ax =
10...
0
+1
, x 0.
The first constraint corresponds to the source vertex, the mth constraint corresponds
to the demand vertex, while the remaining constraints correspond to the intermediate
vertices, i.e., the points of distribution of the unit flow.
The dual problem to SP is
(DSP) v(DSP ) = max(u1 + um),
ATu c. (2.4)
-
Discrete Optimization 33
3 The Shortest Path
3.1 The Primal-Dual Method
Consider the standard linear programming
(P )
min cTx
s.t. Ax = b 0x 0
and its dual
(D)max piT b
s.t. piTA cT .
Suppose that we have a current pi which is feasible to the dual problem (D). Define
the index set J by
J = {j : piTAj = cj} ,
where Aj is the jth column of A. Then for any j / J , we have piTAj < cj. Wecall J the set of admissible columns. In order to search for an x such that it is
not only feasible to the primal problem (P) but also it, togther with pi, satisfies the
complementary condition of (P) and (D), we invent a new LP, called the restricted
primal (RP), as follows
(RP )
= minmi=1
xai
s.t. Ax+ xa = b
xj 0 , for all j ,
xj = 0 , j / J ,
xai 0 , i = 1, . . . ,m ,
-
34
i.e.,
(RP )
= min 0TxJ +mi=1
xai
s.t. AJxJ + xa = b
xJ 0, xa 0 .
The dual of (RP) is
(DRP )
w = max piT b
s.t. piTAj 0, j J
pii 1, i = 1, . . . ,m .
Let (xJ , xa) be an optimal basic feasible solution to (RP) and pi be an optimal basic
feasible solution to (DRP) obtained from (xJ , xa). If w = 0, then = 0. Such an
x is found. Otherwise, w > 0 and we can update pi to
pinew = pi + pi .
The new cost to (D) is
(pinew)T b = piT b+ piT b = piT b+ w,
which means that we shall get a better pi if we can take > 0. On the other hand,
pinew should be feasible to (D), i.e.,
(pinew)TAj = piTAj + pi
TAj cj .
Since for every j J , piTAj 0, we only need to consider those piTAj > 0, j / J .Therefore, we can take
-
Discrete Optimization 35
= mincj piTAjpiTAj
.
j / J
such that
piTAj > 0
Primal P Dual P
(DRP)
Restricted Primal (RP)
Dual of RP
pi
pi
Adjustment to pi
Figure 3.1: An illustration of the prima-dual method
3.2 The Primal-Dual Method for the Shortest Path Problem
Let A be the incidence matrix of the digraph G = (V,E), where V = {1, . . . ,m} andE = {e1, . . . , en}. With each arc ej we associate its length cj 0 and its flow xj 0.The shortest path problem, as we have already known, may be formulated as:
minnj=1
cjxj,
s.t. Ax =
10...
0
+1
,
x 0 .
(3.1)
Let A be the remaining submatrix of A by removing the last row of A (it is redundant
because the sum of all rows of A is zero). Then (3.1) turns into
-
36
minnj=1
cjxj,
s.t. Ax =
10...
0
,x 0 .
(3.2)
The dual problem to (3.2) is
max pi1s.t. pii + pij cij for all (i, j) E,
pim = 0 ,
(3.3)
where we must fix pim = 0 because the last row of A is omitted in A.
The idea of primal-dual algorithm is derived from the idea of searching for a
feasible point x such that
xij = 0 (some xk) whenever pii + pij < cij ,
for given feasible pi (Remark: think about complementary conditions). We search
for such an x by solving an auxiliary problem, called the restricted primal (RP),
determined by the pi we are working with. If our search for the x is not successful,
we nevertheless obtain information from the dual of RP, which we call DRP, and
tells us how to improve the particular pi with which we started.
-
Discrete Optimization 37
Next, we give the details. The shortest-path problem can be written as
minnj=1
cjxj,
s.t. Ax =
+1
0...
0
,x 0 ,
(3.4)
where A = A. The purpose of introducing A is to make the right hand side of theconstraint Ax = b nonnegative. Now, the dual problem of (3.4) is
max pi1
s.t. pii pij cij for all (i, j) E,pim = 0 .
(3.5)
For a given feasible pi to (3.5), the set of admissible arcs is defined by
J = {arcs (i, j) : pii pij = cij} .
The corresponding restricted primal problem (RP) is
= minm1i=1
xai ,
s.t. Ax+ xa =
+1
0...
0
,xj 0 , for all j ,
xj = 0 , j / J ,
xai 0 , i = 1, . . . ,m 1
(3.6)
-
38
and the dual of the restricted primal (DRP) is
w = max pi1
s.t. pii pij 0 for all (i, j) J ,
pii 1 for all i = 1, . . . ,m 1 ,
pim = 0 .
(3.7)
DRP (3.7) is evry easy to solve:
Since pi1 1 and we wish to maximize pi1, we try pi1 = 1. If there is no pathfrom pi1 to pim (node 1 to node m), using only arcs in J , then we can propagate the
1 from node 1 to all nodes reachable by a path from node 1 without violating the
pii pij 0 constraints, and an optimal solution to the DRP is then
pi =
1 for all nodes reachable by paths
from node 1 using arcs in J
0 for all nodes from which node m
is reachable using arcs in J
1 for all other nodes.
(Notice that this pi is not unique.)
We can then calculate
1 = min {cij (pii pij)}
arcs (i, j) / J
such that
pii pij > 0
to update pi and J , and re-solve the DRP.
-
Discrete Optimization 39
1 J
J
J
J
J
m
0
0
1
1 1
1
01
Figure 3.2: A solution to the restricted dual problem
pi : = pi + 1pi .
If we get to a point where there is a path from node 1 to node m using arcs in J ,
pi1 = 0, and we find an optimal solution because = w = 0. Any path from node
1 to node m using only arcs in J is optimal.
The primal-dual algorithm reduces the shortest path problem to repeated solution
of the simpler problem of finding the set of nodes reachable from a given node.
Interpretation: Define at any point in the algorithm the set
W = {i : node m is reachable from i
by admissible arcs}
= {i : pii = 0} .
Then the variable pii remains fixed from the time that i enters W to the conclusion
of the algorithm, because the corresponding pii will always be zero.
Every arc that becomes admissible (enter J) stays admissable throughout the
-
40
algorithm, because once we have
pii pij = cij for (i, j) E ,
we always change pii and pij by the same amount.
pii, i W is the length of the shortest path from node i to node m and thealgorithm proceeds by adding to W , at each stage, the nodes not in W next closest
to node m.
At most |v| = m stages.
Dijkstras algorithm is an efficient implementation of the primal-dual algorithm
for the shortest path problem.
3.3 Bellmans Equation
Let cij be the length of arc (i, j) (positive arcs if cij > 0; nonnegative if cij 0).
Let uij be the length of the shortest path from i j. Define
ui = u1i.
Then Bellmans Equations are u1 = 0,ui = mink 6=i
{uk + cki}.
3.4 Dijkstras Algorithm
In this section we assume that cij 0. Denote
P : permanently labeled nodes;
T : temporarily labeled nodes.
-
Discrete Optimization 41
1
i
kk
i
ki
u
c
u
Figure 3.3: Bellmans equation
P and T always satisfy
P T = & P T = V.
Label for node j, [uj, lj] where uj : the length of the (may be temporary) shortest
path from node 1 to j and lj : the preceding node in the path.
Dijkstras algorithm can be summarized as follows.
Step 0. P = {1}, u1 = 0, l1 = 0, T = V \P. Compute
uj =
c1j if (1, j) E, if (1, j) / E,lj =
1 if (1, j) E,0 if (1, j) / E.Step 1. Find k T such that
uk = minjT
{uj}.
Let P = P {k} and T = T\{k}. If k = n, stop.
-
42
Step 2. For j T , if uk + ckj < uj, let [uj = uk + ckj, lj = k] and go back toStep 1.
Claim: At any step, uj is the length of the shortest path from 1 to j, only passing
nodes in P .
[Suppose not and j is the first violation... ].
Claim: The total cost is O(n2).
3.5 PERT or CPM Network
A large project is devisable into many unit tasks. Each task requires a certain
amount of time for its completion, and the tasks are partially ordered.
This network is sometimes called a PERT (Project Evaluation and Review Tech-
nique) or CPM (Critical Path Method) network. A PERT network is necessarily
acyclic.
Theorem 3.1 A digraph is acyclic if and only if its nodes can be renumbered in such
a way that for all arc (i, j), i < j. [The work of this is O(n2)]
Claim: For any acyclic graph, at least one node has indegree 0. After renumbering
it, we have for all (i, j), i < j.
Bellmans equations are u1 = 0,ui = mink 6=i
{uk + cki}
-
Discrete Optimization 43
For acyclic graphs, they turn out to be u1 = 0,ui = mink
-
44
3.7 Floyd-Warshall Method for Shortest Paths Between All Pairs
Again, we need the assumption that the networks contain no negative cycles in order
that the Floyd-Warshall method works.
Step 0. u(1)ij = cij, i, j = 1, . . . , n.
Step k. For k = 1, . . . , n,
u(k+1)ij = min{u(k)ij , u(k)ik + u(k)kj }, i, j = 1, . . . , n
Claim: u(k)ij is the length of a shortest path from i to j, subject to the condition
that the path does no pass through k, k + 1, . . . n (i and j excepted). [This means
u(n+1)ij = uij].
Proof by induction. It is clearly true for Step 0. Suppose it is true for u(k)ij for
all i and j. Now consider u(k+1)ij . If a shortest path from node i to node j which does
not pass through nodes k+1, k+2, . . . n does not pass through k, then u(k+1)ij = u
(k)ij .
Otherwise, if it does pass through node k, u(k+1)ij = u
(k)ik + u
(k)kj .
It is easy to see that the complexity of the Floyd-Warshall method is O(n3).
The Floyd-Warshall requires the storage of an n n matrix. Initially this isU (1) = C. Thereafter, U (k+1) is obtained from U (k) by using row k and column k
to revise the remaining elements. That is, uij is compared with uik + ukj and if the
later is smaller, uik + ukj is substituted for uij in the matrix.
There are other methods of the above type, e.g. G B Dantzig method.
3.8 Other Cases
1. Sparse graphs
|A|
-
Discrete Optimization 45
not allow repetitive arcs not allow repetitive nodes3. with time constraints
4. with fixed charge
-
ZusrlarafZ (z4zdz 'z) =Q :G
-
46
4 The Greedy Algorithm and Com-putational Complexity
4.1 Matroid
1935, matroid theory founded by H. Whitney; 1965, J. Edmonds pointed out the significance of matroid theory
to combinatorial optimization (CO).
Importance: 1) Many CO problems can be formulated as matroid
problems, and solved by the same algorithm;
2) We can detect the insight of the CO problems;
3) A special tool for CO.
Definition 4.1 Suppose we have a finite ground set S, |S| < , anda collection, , of subsets of S. Then H := (S,) is said to be an
independent system if the empty set is in and is closed under
inclusion; that is
i) ;
ii) X Y = X .
Elements in are called independent sets, and subsets of S not in
are called dependent sets.
-
Discrete Optimization 47
Example: Matching system. G = (V,E),
= {all matchings in G}.
[A matchingM of a graph G = (V,E) is a subset of the edges with the
property that no two edges of M share the same node. A matching M
is a piecewise disjoint edge set]
e1
e3 e2e4
Figure 4.1: A Matching Example
In Figure 4.1,
S = {e1, e2, e3, e4}, = {, {e1}, {e2}, {e3}, {e4}, {e2, e3}}.
-
48
Definition 4.2 If H = (S,) is an independent system such that
X, Y , |X| = |Y |+ 1 =
there exists e X\Y such that Y + e ,
then H (or the pair (S,)) is called a matroid.
Examples: i) Matric matroid: A matrix A = (a1, . . . , an)mn, S =
{a1, . . . , an},
X X = {ai1, . . . , aik} is independent.
ii) Graphic matroid: G = (V,E), S = E,
X X E, X has no cycle.
ii) is a special case of i) with A = the vertex-edge incidence matrix.
4.2 The Greedy Algorithm
Suppose that H = (S,) is an independent system and W : S
-
Discrete Optimization 49
Greedy Algorithm:
Suppose W (e1) W (e2) . . . W (en).Step 0. Let X = .Step k. If X + ek , let X := X + ek, where k = 1, . . . , n.
Theorem 4.1 (Rado, Edmonds) The above algorithm works if and
only if H is a matroid.
Applications:
1) The Maximal Spanning Tree Problem.
Suppose that there is a television network leasing video links so that
its stations in various places can be formed into a connected network.
Each link (i, j) has a different rental cost cij. The question is how the
network can be constructed to have the minimum cost? Obviously,
what is wanted is a minimum cost spanning tree of video links. Re-
placing cij by M cij, where M is a larger number, we can see that itthen turns into a maximum spanning tree (MST). Kruskal has already
proposed the following solution: Choose the edges one at a time in
order of their weights, largest first, rejecting an arc only if it forms a
cycle with edges already chosen.
2) A Sequencing Problem.
Suppose that there are a number of jobs which are to be processed
-
50
by a single machine. All jobs require the same processing time. Each
job j has assigned to it a deadline dj, and a penalty pj, which must be
paid if the job is not completed by its deadline. What ordering of the
jobs minimizes the total penalty costs? It can be easily seen that there
exists an optimal sequence in which all jobs completed on time appear
at the beginning of the sequence in order of deadlines, earliest deadline
first. The late jobs follow, in arbitrary order. Thus, the problem is to
choose an optimal set of jobs which can be completed on time. The
following procedure can be shown to accomplish that objective.
Choose the jobs one at a time in order of penalties, largest first,
rejecting a job only if its choice would mean that it, or one of the jobs
already chosen, cannot be completed on time. [This requires checking to
see that the total amount of processing to be completed by a particular
deadline does not exceed the deadline in question.]
For example, consider the set of jobs below, where the processing
time of each job is one hour, and the deadlines are expressed in hours
of elapsed time.
-
Discrete Optimization 51
Job Deadline Penalty
j dj pj
1 1 10
2 1 9
3 3 7
4 2 6
5 3 4
6 6 2
Job 1 is chosen, but job 2 is discarded, because the two together
require two hours of processing time and the deadline for job 2 is at
the end of the first hour. Jobs 3 and Jobs 4 are chosen, job 5 is
discarded, and job 6 is chosen. An optimal sequence is jobs 1, 4,3, and
6, followed by the late jobs 2 and 5.
3) A Semimatching Problem.
Let W be an mn nonnegative matrix. Suppose we wish to choosea maximum weight subset of elements, subject to the constraint that
no two elements are from the same row of the matrix. Or, in other
-
52
words, the problem is to
maximizei,j
wijxij
subject toj
xij 1, i = 1, ...,m
xij {0, 1}.
This semimatching problem can be solved by choosing the largest el-
ement in each row of W . Or alternatively: choose the elements one
at a time in order of size, largest first, rejecting an element only if an
element in the same row has already been chosen.
4.3 General Introduction on Compu-tational Complexity
Initiated in large measure by the seminal papers of S. A. Cook (1971)
and R. M. Karp (1972) in the area of discrete optimization.
Definition 4.3 An instance of an optimization problem consists of
a feasible set F and a cost function c : F
-
Discrete Optimization 53
some instances are larger than others, and it is convenient to define
the notion of the size of an instance.
Definition 4.4 The size of an instance is defined as the number of
bits used to describe the instance, according to a prescribed format.
Given that arbitrary numbers cannot be represented in binary, this
definition is geared towards instances involving integer (or rational)
numbers. Note that any nonnegative integer r smaller or equal to U
can be written in binary as follows:
r = ak2k + ak12k1 + . . .+ a121 + a0,
where the scalars a0, . . . , ak, are 0 or 1. The number k is clearly at
most blog2Uc, since r U . We can then represent r by the binaryvector (a0, a1, . . . , ak). With an extra bit for sign, we can aslo represent
negative numbers. In other words, we can represent any integer with
absolute value less than or equal to U using at most blog2Uc+ 2 bits.Consider now an instance of a linear programming problem in stan-
drad form, i.e., an m n matrix A, an m-vector b, and an nvectorc, and assume that the magnitude of the largest element of {A,b, c}is equal to U . Since there are (mn+m+n) entries in A,b, and c, the
size of such an instance is at most
(mn+m+ n)(blog2Uc+ 2).
In fact, this count is not exactly correct: more bits will be needed
to encode flags that indicate where a number ends, and another
-
54
starts. However, our count is right as far as the order of magnitude is
concerned. To avoid details of this kind, we will be using instead the
order-of-magnitude notation, and we will simply say that the size of
such an instance is O(mnlogU).
Optimization problems are solved by algorithms. The running time
of an algorithm will, in general, depend on the instance to which it is
applied. Let T (n) be the worst-case running time of some algorithm
over all instances of size n, under the bit model.
Definition 4.5 An algorithm runs in polynomial time if there exists
an integer k such that T (n) = O(nk).
Fact: Suppose that an algorithm takes polynomial time under the
arithmetic model. Furthermore, suppose that on instances of size n,
any integer produced in the course of execution of the algorithm has
size bounded by a polynomial in n. Then, the algorithm runs in poly-
nomial time under the bit model as well.
The class P : A combinatorial optimization (CO) problem is in P ifit admits algorithms of polynomial complexity.
The class NP : A combinatorial problem is in NP if for all YESinstances, there exists a polynomial length certificate that can be
used to verify in polynomial time that the answer is indeed yes.
-
Discrete Optimization 55
NP : e.g., verify the optimality of an LP solution.
Obviously, P NP . But,
P = NP?
Definition 4.6 Suppose that there exists an algorithm for some prob-
lem A that consists of a polynomial time computation in addition of
polynomial number of subroutine calls to an algorithm for problem B.
We then say that problem A reduces (in polynomial time) to problem
B. For short, AR= B.
In the above definition, all references to polynomiality are with re-
spect to the size of an instance of problem A.
Theorem 4.2 If AR= B and B P, then A P.
The above theorem says that if AR= B, then problem A is not
much more difficult than problem B.
For example, let us consider the following scheduling problem: a set
of jobs are to be processed on two machines where no job requires in
excess of three operations. A job may require, for example, processing
on machine one first, followed by machine two, and finally back on
machine one. Our objective is to minimize makespan, i.e., complete
the set of jobs in minimum time. Let us refer to this problem as (PJ).
-
56
Now, take the one-row integer program or knapsack problem that
we state in the equality form: given integers a1, a2, . . . , an and b, does
there exist a subset S {1, 2, . . . , n} such that jS aj = b? Callingthe later problem (PK), our objective is to show that (PK) polynomially
reduces to (PJ).
For a given (PK) we construct an instance of (PJ) wherein the first
n jobs require only one operation, this being on machine one. Each
has processing time aj for j = 1, 2, . . . , n. Job n + 1 possesses three
operations constrained in such a way that the first is on machine two,
the second on machine one, and the last on machine two again. The
first such operation has duration b, the second duration 1, and the
third durationn
j=1 aj b.
Clearly, one lower bound on the completion of processing time of
all jobs in this instance of (PJ) is the sum of processing times for job
n + 1, i.e.,n
j=1 aj + 1. Any feasible schedule for all jobs achieving
this makespan value must be optimal. Suppose a subset S exists such
that the knapsack problem is solvable. For (PJ) we can schedule jobs
implies by S first on machine one, followed by the second operation of
job n + 1, and complete with the remaining jobs (those not given by
S). The first and last operations for job n+1 (on machine two) finish
at times b andn
j=1 aj +1, respectively. Thus, the completion time of
this schedule isn
j=1 aj + 1.
-
Discrete Optimization 57
If, conversely, there is no subset S {1, 2, . . . , n} withjS aj = bour scheduling instance would be forced to a solution like: either job
n + 1 waits before it obtains the needed unit of time on machine one
or some of jobs 1, 2, . . . , n wait to keep job n + 1 progressing. Either
way the last job will complete after timen
j=1 aj + 1.
We can conclude that the question of whether (PK) has a solution
can be reduced to asking whether the corresponding (PJ) has makespan
no greater thann
j=1 aj+1. Since (as is usually the case) the size of the
required (PJ) instance is a simple polynomial (in fact linear) function
of the size of (PK), we have a polynomial reduction. Problem (PK)
indeed reduces polynomially to (PJ).
4.4 Three Forms of a CO Problem
A CO problem: F is the feasible solution set and c : F < is a costfunction,
min c(f)
s.t. f F.
The above CO problem has three versions:
a) Optimization version: Find the optimal solution.
b) The evaluation version: Find the optimal value of c(f), f F .
c) The recognition version: Given an integer L, is there a feasible
-
58
solution f F such that c(f) L?.
These three type of problems are closely related in terms of algorith-
mic difficulty. In particular, the difficulty of the recognition problem
is usually a very good indicator of the difficulty of the corresponding
evaluation and optimization problems. For this reason, we can focus,
without loss of generality, on recognition problems.
Consider the following combinatorial optimization problem, called
the maximum clique problem:
Given a graph G = (V,E) find the largest subset C V such thatfor all distinct u, v C, (v, u) E.
The maximum clique problem is in NP or in short, Clique NP .
Assume that we have a procedure cliquesize which, given any graph
G, will evaluate the size of the maximum clique of G. In other words
cliquesize solves the evaluation version of the maximum clique problem.
We can then make efficient use of this routine in order to solve the
optimization version.
Step 0 . X = .Step 1. Find v V such that cliquesize(G(v)) = cliquesize(G),
where G(v) is the subgraph of G consisting of v and all its adjacent
nodes.
Step 2. X = X + v. G = G(v)\v. If G = , stop; otherwise, go to
-
Discrete Optimization 59
Step 1.
We now discuss the relation between the three variants in general.
Let us assume that the cost c(f) of any feasible f F can be computedin polynomial time. It is then clear a polynomial time algorithm for
the optimization problem leads to a polynomial time algorithm for
the optimization problem. (Once an optimal solution is found, use
it to evaluate - in polynomial time, the optimal cost.) Similarly, a
polynomial time for the evaluation problem immediately translates to
a polynomial time algorithm for the recognition problem. For many
interesting problems, the converse is also true: namely a polynomial
time algorithm for the recognition problem often leads to polynomial
time algorithms for the evaluation and optimization problems.
Suppose that the optimal cost is known to take one ofM values. We
can then perform binary search and solve the evaluation problem using
dlogMe calls to an algorithm for the recognition problem. If logM isbounded by a polynomial function of the instance size (which is often
the case), and if the recognition algorithm runs in polynomial time, we
obtain a polynomial time algorithm for the evaluation problem.
We will now give another example to show how a polynomial time
evaluation algorithm can lead to a polynomial time optimization al-
gorithm by using the zero-one integer programming problem (ZOIP).
Given an instance I of ZOIP, let us consider a particular component
-
60
of the vector x to be optimized, say x1, and let us form a new instance
I by adding the constraint x1 = 0. We run an evaluation algorithm
on instances I and I . If the outcome is the same for both instances,
we can set x1 to zero without any loss of optimality. If the outcome
is different, we conclude that x1 should be set to 1. In either case, we
have arrived at an instance involving one less variable to be optimized.
Continuing the same way, fixing the value of one variable at a time, we
obtain an optimization algorithm whose running time is roughly equal
to the running time of the evaluation algorithm times the number of
variables.
4.5 NPCThe class coNP : A combinatorial problem is in coNP if for allNO instances, there exists a polynomial length certificate that can
be used to verify in polynomial time that the answer is indeed no.
Obviously, P coNP . But,
P = coNP?
The next definition deals with the simplest type of a reduction,
where an instance of problem A is replaced by an equivalent instance
of problem B. Rather than developing a general definition of equiv-
alence, it is more convenient to focus on the recognition problems,
that is, problems that have a binary answer (e.g., YES or NO).
-
Discrete Optimization 61
NPP
co_NP
Figure 4.2: Relationships among P, NP and co-NP
Definition 4.7 Let A and B be two recognition problems. We say that
problem A transforms to problem B (in polynomial time) if there ex-
ists a polynomial time algorithm which given an instance I1 of problem
A, outputs an instance I2 of B, with the property that I1 is a YES
instance of A if and only if I2 is a YES instance of B. [AR= B.]
The class NP-hard: A problem A is NPhard if for any problemB NP , B R= A.
Theorem 4.3 Suppose that a problem C is NP-hard and that C canbe transformed (in polynomial time) to another problem D. Then D is
NP-hard.
Define a set of Boolean variables {x1, x2, . . . , xn} and let the com-plement of any of these variables xi be denoted by xi. In the language
of logic, these variables are referred to as literals. To each literal we
assign a label of true or false such that xi is true if and only if xi is
false.
-
62
Let the symbol denote or and the symbol denote and. We thencan write any Boolean expression in which is referred to as conjunctive
normal form, i.e., as a finite conjunction of disjunctions using each lit-
eral once at most. For example, with the set of variables {x1, x2, x3, x4}one might encounter the following conjunctive normal form expression
(x1 x2 x4) (x1 x2 x3) (x2 x4).
Each disjunctive grouping in parenthesis is referred to as a clause. The
satisfiability problem is
Given a set of literals and a conjunction of clauses defined over the
literals, is there an assignment of values to the literals for which the
Boolean expression is true?
If so, then the expression is said to be satisfiable. The Boolean expres-
sion above is satisfiable via the following assignment: x1 = x2 = x3 =
true and x4 = false. Let SAT denote the satisfiability problem and Q
be any member of NP .
Theorem 4.4 (Cook (1971)) Every problem Q NP polynomiallyreduces to SAT.
Karp (1972) showed that SAT polynomially reduces to many com-
binatorial problems.
The class NPC: A recognition problem A is NPC if
-
Discrete Optimization 63
i) A NP and
ii) for any problem B NP , B R= A.
Cooks Theorem shows SAT NPC because it can be checked easilythat SAT NP .
Examples of NPC problems: ILP, ZOIP, Clique, Vertex Packing,TSP, TSP, 3-Index Assignment, Knapsack, etc.
NP-hard
NPC
P
NP
Figure 4.3: Relationships among P, NP, NPC, and NP-hard
NP-hardness is not a definite proof that no polynomial time algo-rithm exists. For all we know, it is always possible that ZIOP belongs
to P , and P = NP . Nevertheless, NP-hardness suggests that we
-
64
should stop searching for a polynomial time algorithm, unless we are
willing to tackle the P = NP question.
For a good guide to the theory of NPC, see1979, M. R. Garey and D. S. Johnson, Computers and Intractabil-
ity: a Guide to the Theory of NP-Completeness.1995, C.H. Papadimitriou, Computational Complexity.